scriptotek / mc2skos

Command line script for converting Marc21 Classification and Authority records to SKOS/RDF
The Unlicense
21 stars 4 forks source link

Support JSKOS as alternative output format #15

Closed nichtich closed 7 years ago

nichtich commented 8 years ago

This could be implemented in add_to_graph based on out_format.

nichtich commented 7 years ago

See https://github.com/gbv/mc2skos/tree/jskos for a first try based on rdflib-jsonld

danmichaelo commented 7 years ago

By "Converting the full DDC would not work", do you mean that it consumes too much memory? Serializing each concept in a streaming manner as it's processed instead of piling them all up in the graph would be good for all output formats, but not sure how to implement it without digging too much into rdflib internals. Perhaps it could work to serialize and flush the graph after each concept has been processed, but not sure what the overhead would be, and how to handle the context.

nichtich commented 7 years ago

Yes, I bet that streaming would be required at least for JSKOS - but I have not tested with a full DDC dump yet.

danmichaelo commented 7 years ago

When I'm converting a full DDC to Turtle, it uses about 400 MB of memory (or 500 when using all the flags), which isn't really an issue, even though streaming output would be much more elegant, and the lack of it is clearly one of the weak points of this tool.

nichtich commented 7 years ago

Ok - streaming is not actually needed. I'll further work on support of JSKOS.