ufal / ParCzech

ParCzech is a project on compiling Czech parliamentary data into annotated corpora.
https://ufal.mff.cuni.cz/parczech
0 stars 1 forks source link

Change IDs structure #29

Closed matyaskopp closed 3 years ago

matyaskopp commented 4 years ago

Current document id structure:

doc-2017-039-06-196b {prefix}-{starting term year}-{meeting}-{sitting}-{agenda item}{flag for second and following talking about same item}

Suggested document id structure

It would be easier to order files within a single sitting. The rest of ordering can be done according to sitting date.

utterance id

segment id

sentence id

token id

sub-token id

Other elements with id

for each type use a counter within document and unique (two letter and order) suffix

matyaskopp commented 3 years ago

done