Open thorwhalen opened 4 years ago
Some code to start off with:
f = '/D/Dropbox/dev/p3/notebooks/test.md'
import json
from mistletoe.ast_renderer import ASTRenderer
t = mt.markdown(open(f, 'r'), renderer=ASTRenderer)
d = json.loads(t)
print(d.keys())
dd = d['children']
print(type(dd), len(dd))
from boltons.iterutils import remap # https://pypi.python.org/pypi/boltons
def get_all_tags(root):
all_tags = Counter()
def visit(path, key, value):
all_tags.update([value['type']])
if value['type'] == 'CodeFence':
print(value['children'])
return True
remap(root, visit=visit, reraise_visit=False)
return all_tags
get_all_tags(d)
When you want to read zips, there's the FilesOfZip
, ZipReader
, or ZipFilesReader
we know and love.
Sometimes though, you want to write to zips too. For this, we have ZipStore
.
But words of warning:
Persister
won't be a full persister.zipfile
. Yeah, that's right. It'll just go on
writing new kv pairs, without complaining (except for a UserWarning
).
It's not just me thinking it's weird (e.g. https://bugs.python.org/issue2824)
But fret not, the ZipStore
has a "natural" behavior: It won't allow you to create duplicates.
On the other hand, it won't allow you to overwrite the value of an existing key either (remember, we
can't delete!)Also, since ZipStore
can write to a zip, it's read functionality is not going to assume static data,
and cache things, as your favorite zip readers did.
This, and the acrobatics need to disguise the weird zipfile
into something more... key-value natural,
makes for a not so efficient store, out of the box.
I advise using one of the zip readers if all you need to do is read, or sub-classing or
wrapping ZipStore
with caching layers if it is appropriate to you.
from py2store.slib.zipfile import ZipStore, OverwriteNotAllowed
import os
f = 'my_zipstore_test.zip'
if os.path.isfile(f):
os.remove(f) # remove file if it exists
z = ZipStore(f)
assert list(z) == []
z['here'] = b'is a file'
assert list(z) == ['here']
assert z['here'] == b'is a file'
assert z.head() == ('here', b'is a file')
z['there'] = b'is another file'
assert list(z) == ['here', 'there']
try:
z['here'] = b'something else'
except OverwriteNotAllowed as e:
assert isinstance(e, FileExistsError) # see that a OverwriteNotAllowed is a FileExistsError!
print(f"Expected: {e.__class__.__name__}{e.args}")
Expected: OverwriteNotAllowed("You're not allowed to overwrite an existing key: here",)
assert list(z) == ['here', 'there']
Let's make another zip pointing to the same zip file.
zz = ZipStore(f)
assert list(zz) == ['here', 'there'] # see that this new zip store sees the data too!
We provided the convenience of a dict-like interface to write to zips, but it comes at a price: Every time you write an item, a new zip writer (a zipfile.ZipFile
object, opened for appending data) will be created, opened, and closed.
If you're going to be writing a bunch of data to the zip, you might want to use the ZipStore
's context manager functionality instead.
Context manager. Yeah... that thing where you say with WhatEver as whatevs: ...
. You've seen'em.
if os.path.isfile(f):
os.remove(f) # remove file if it exists
the_data_i_want_to_write = {'foo': b'bar', 'green': b'eggs', 'hello': b'world'}
with ZipStore(f) as z:
for k, v in the_data_i_want_to_write.items():
z[k] = v
Note that this z, through defined in the with statement, is still available outside the with block.
assert list(z) == ['foo', 'green', 'hello']
['foo', 'green', 'hello']
In fact, sometimes it's convenient to apply the with statement to the ZipStore
instance z
instead
if os.path.isfile(f):
os.remove(f) # remove file if it exists
z = ZipStore(f)
with z: # enter a context to write some data
for k, v in {'foo': b'bar', 'green': b'eggs'}.items():
z[k] = v
assert list(z) == ['foo', 'green'] # yep, it's there
with z: # enter a context again to write some more
z['hello'] = b'world'
assert list(z) == ['foo', 'green', 'hello']
Let's remove that zip file again to start afresh...
if os.path.isfile(f):
os.remove(f) # remove file if it exists
Making a zip store just remembers the zip file you want to work with:
z = ZipStore(f)
z
ZipStore('my_zipstore_test.zip')
At this point the zip file doesn't even exist, but your store pretends to be an empty store
list(z), len(z)
([], 0)
If you try to get something out of it, it will return an EmptyZipError
(which is a KeyError
).
try:
z['i_do_not_exist']
except KeyError as e:
print(f"Expected: {e.__class__.__name__}{e.args}")
Expected: EmptyZipError("The zip file doesn't exist yet! Nothing was written in it: my_zipstore_test.zip",)
Wrap https://github.com/miyuchina/mistletoe to produce a store that can easily do ETL on markdown tokens/blocks.
See: https://github.com/miyuchina/mistletoe/blob/master/mistletoe/ast_renderer.py and/or BaseRenderer.
[remap] (https://sedimental.org/remap.html#convert-dictionaries-to-ordereddicts) could help Might also look at https://github.com/i2mint/py2mint/blob/master/py2mint/routing_forest.py