oracc / pyoracc

Python tools for working with ORACC
GNU General Public License v3.0
12 stars 10 forks source link

json serialization #80

Closed rillian closed 5 years ago

rillian commented 5 years ago

I wrote this to see what kind of syntax tree the parser was producing, and it took me a while to understand how, since json.dump() from the standard library doesn't work on general Python objects. I thought it worth including for those reasons, and as a useful way to import tablet data into other tools.

# Usage example.

from pyoracc.atf.common.atffile import AtfFile

with open('example.atf') as f:
  atf = AtfFile(f.read())
  print(atf.to_json())

It currently produces a "flat" serialization without object names, which I found most useful for exploring. If there's interest in adding parsing (being able to import json and serialize it back into ATF) that would probably need to change.

The to_json method is general, so it would be nice to be able to call it on any of the tree objects. The easiest way to do that would be to add it to oraccobject and then make all the objects in the model hierarchy inherit from that.

The to_json method passes optional arguments on to json.dump(), but there's a problem with sort_keys. The Multilingual objects store the unmarked language lines in a dictionary under None which can't be sorted with respect to the other strings. I've just left this as an xfail in the tests, since it's not the default. If you want to address this I can suggest changing the parser to substitute the overall language code of the tablet, or the empty string, or copying the whole tree and making a similar substitution before serializing. The latter would be expensive on corpus objects.

NB Currently includes changes from #77 which I hope will be merged first, and from #79 to make the tests pass.

codecov-io commented 5 years ago

Codecov Report

Merging #80 into master will increase coverage by 1.01%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #80      +/-   ##
==========================================
+ Coverage   86.22%   87.23%   +1.01%     
==========================================
  Files          27       27              
  Lines         987      995       +8     
==========================================
+ Hits          851      868      +17     
+ Misses        136      127       -9
Impacted Files Coverage Δ
pyoracc/atf/common/atffile.py 80.85% <100%> (+3.35%) :arrow_up:
pyoracc/atf/common/atfyacc.py 98.65% <0%> (+0.54%) :arrow_up:
pyoracc/atf/common/atflex.py 100% <0%> (+2.37%) :arrow_up:
pyoracc/__init__.py 75% <0%> (+12.5%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 39c612f...af363ab. Read the comment docs.

ageorgou commented 5 years ago

This can be merged after #77 and #79.

ageorgou commented 5 years ago

@rillian Can you please rebase this on the new master to keep things a bit cleaner? Thanks!

rillian commented 5 years ago

\o/

jayanthkmr commented 5 years ago

@rillian I might be late. But you can use this to serialize any Python 3 object:

https://gist.github.com/jayanthjaiswal/b722625f0cebda14cdfaaa7e8b74c3ae

rillian commented 5 years ago

Thanks @jayanthjaiswal, that works too! It's similar to what I did, I think, but handles more types and does its own recursion instead of hooking into the JSONEncoder's traversal. Useful for the next time it comes up!