mr-martian / rebabel-format

Python library for interacting with reBabel data files
MIT License
1 stars 4 forks source link

ELAN export #9

Closed mr-martian closed 1 month ago

mr-martian commented 2 months ago

Since e9a50496bc272c24ae6d3295f1351f14e18ab04d, Writer subclasses operate on query results, which is fine for formats with relatively limited semantics, but .eaf and a few other formats are flexible as to what hierarchy they represent and thus we need some way to building the appropriate query at run-time. I see three potential solutions:

Option 1: Template files

We could accept a template file and derive the structure from that.

Option 2: User supplies the query

The user could just write out the full query (though we would need to be careful not to screw up the internal machinery of Writer).

Option 3: User supplies hierarchy

something like

nodes = [
    ('phrase', None),
    ('word', 'phrase'),
    ('segment', 'word'),
]

indicating that we have phrases containing words containing segments.

pandersity commented 1 month ago

I am drawn to option one, because it has the potential to be the most accessible with less scripting experience. But definitely depends on the design of the template. Will the template be something created from scratch by the user? Or something exported from ELAN and directly consumed by reBabel?

The less keystrokes a user has to type, the less likely they are to mess up internals or make mistakes. Makes it easier to succeed and find tool helpful.

mr-martian commented 1 month ago

Will the template be something created from scratch by the user? Or something exported from ELAN and directly consumed by reBabel?

The implementation in 493228e takes a normal EAF file and deletes any existing annotations.

mr-martian commented 1 month ago

Template files are sufficient for now, I think.