i2mint / py2store

Tools to create simple and consistent interfaces to complicated and varied data sources.
MIT License
11 stars 2 forks source link

MarkdownStore #61

Open thorwhalen opened 4 years ago

thorwhalen commented 4 years ago

Wrap https://github.com/miyuchina/mistletoe to produce a store that can easily do ETL on markdown tokens/blocks.

See: https://github.com/miyuchina/mistletoe/blob/master/mistletoe/ast_renderer.py and/or BaseRenderer.

[remap] (https://sedimental.org/remap.html#convert-dictionaries-to-ordereddicts) could help Might also look at https://github.com/i2mint/py2mint/blob/master/py2mint/routing_forest.py

thorwhalen commented 4 years ago

Some code to start off with:

f = '/D/Dropbox/dev/p3/notebooks/test.md'

import json
from mistletoe.ast_renderer import ASTRenderer
t = mt.markdown(open(f, 'r'), renderer=ASTRenderer)
d = json.loads(t)
print(d.keys())
dd = d['children']
print(type(dd), len(dd))
from boltons.iterutils import remap  # https://pypi.python.org/pypi/boltons

def get_all_tags(root):
    all_tags = Counter()

    def visit(path, key, value):
        all_tags.update([value['type']])
        if value['type'] == 'CodeFence':
            print(value['children'])
        return True

    remap(root, visit=visit, reraise_visit=False)

    return all_tags

get_all_tags(d)
thorwhalen commented 4 years ago

test.md

When you want to read zips, there's the FilesOfZip, ZipReader, or ZipFilesReader we know and love.

Sometimes though, you want to write to zips too. For this, we have ZipStore.

But words of warning:

Also, since ZipStore can write to a zip, it's read functionality is not going to assume static data, and cache things, as your favorite zip readers did. This, and the acrobatics need to disguise the weird zipfile into something more... key-value natural, makes for a not so efficient store, out of the box.

I advise using one of the zip readers if all you need to do is read, or sub-classing or wrapping ZipStore with caching layers if it is appropriate to you.

from py2store.slib.zipfile import ZipStore, OverwriteNotAllowed
import os

f = 'my_zipstore_test.zip'

The basics

if os.path.isfile(f): 
    os.remove(f)  # remove file if it exists
z = ZipStore(f)
assert list(z) == []
z['here'] = b'is a file'
assert list(z) == ['here']
assert z['here'] == b'is a file'
assert z.head() == ('here', b'is a file')
z['there'] = b'is another file'
assert list(z) == ['here', 'there']
try:
    z['here'] = b'something else'
except OverwriteNotAllowed as e:
    assert isinstance(e, FileExistsError)  # see that a OverwriteNotAllowed is a FileExistsError!
    print(f"Expected: {e.__class__.__name__}{e.args}")
Expected: OverwriteNotAllowed("You're not allowed to overwrite an existing key: here",)
assert list(z) == ['here', 'there']

Let's make another zip pointing to the same zip file.

zz = ZipStore(f)
assert list(zz) == ['here', 'there']  # see that this new zip store sees the data too!

using the context manager

We provided the convenience of a dict-like interface to write to zips, but it comes at a price: Every time you write an item, a new zip writer (a zipfile.ZipFile object, opened for appending data) will be created, opened, and closed.

If you're going to be writing a bunch of data to the zip, you might want to use the ZipStore's context manager functionality instead.

Context manager. Yeah... that thing where you say with WhatEver as whatevs: .... You've seen'em.

if os.path.isfile(f): 
    os.remove(f)  # remove file if it exists
the_data_i_want_to_write = {'foo': b'bar', 'green': b'eggs', 'hello': b'world'}
with ZipStore(f) as z:
    for k, v in the_data_i_want_to_write.items():
        z[k] = v

Note that this z, through defined in the with statement, is still available outside the with block.

assert list(z) == ['foo', 'green', 'hello']
['foo', 'green', 'hello']

In fact, sometimes it's convenient to apply the with statement to the ZipStore instance z instead

if os.path.isfile(f): 
    os.remove(f)  # remove file if it exists

z = ZipStore(f)

with z:  # enter a context to write some data
    for k, v in {'foo': b'bar', 'green': b'eggs'}.items():
        z[k] = v

assert list(z) == ['foo', 'green']  # yep, it's there

with z:  # enter a context again to write some more
    z['hello'] = b'world'

assert list(z) == ['foo', 'green', 'hello']

Details about the cold start (when you make a store with a zip file that doesn't (yet) exist.

Let's remove that zip file again to start afresh...

if os.path.isfile(f): 
    os.remove(f)  # remove file if it exists

Making a zip store just remembers the zip file you want to work with:

z = ZipStore(f)
z
ZipStore('my_zipstore_test.zip')

At this point the zip file doesn't even exist, but your store pretends to be an empty store

list(z), len(z)  
([], 0)

If you try to get something out of it, it will return an EmptyZipError (which is a KeyError).

try:
    z['i_do_not_exist']
except KeyError as e:
    print(f"Expected: {e.__class__.__name__}{e.args}")
Expected: EmptyZipError("The zip file doesn't exist yet! Nothing was written in it: my_zipstore_test.zip",)