doc-2017-039-06-196b
{prefix}-{starting term year}-{meeting}-{sitting}-{agenda item}{flag for second and following talking about same item}
Suggested document id structure
[x] ps2017-039-06-004-196
{organ prefix}{starting term year}-{meeting}-{sitting}-{order of topic in a single sitting}-{agenda item}
It would be easier to order files within a single sitting. The rest of ordering can be done according to sitting date.
utterance id
[x] ps2017-039-06-004-196.u001
suffix is speech order within a topic (starting from 001)
segment id
[x] ps2017-039-06-004-196.u001.p001
suffix is paragrapg <seg> order within a speech (starting from 001)
sentence id
[x] ps2017-039-06-004-196.u001.p001.s001
suffix is sentence <s> order within a <seg> (starting from 001)
token id
[x] ps2017-039-06-004-196.u001.p001.s001.w001
suffix is token <w> or <pc> order within a <s> (starting from 001) in orthographic layout
Really long sentences !!!
sub-token id
[x] ps2017-039-06-004-196.u001.p001.s001.w001.01
suffix is subtoken number within a <w>
Other elements with id
for each type use a counter within document and unique (two letter and order) suffix
Current document id structure:
doc-2017-039-06-196b {prefix}-{starting term year}-{meeting}-{sitting}-{agenda item}{flag for second and following talking about same item}
Suggested document id structure
It would be easier to order files within a single sitting. The rest of ordering can be done according to sitting date.
utterance id
.u001
suffix is speech order within a topic (starting from 001)segment id
.p001
suffix is paragrapg<seg>
order within a speech (starting from 001)sentence id
s001
suffix is sentence<s>
order within a<seg>
(starting from 001)token id
.w001
suffix is token<w>
or<pc>
order within a<s>
(starting from 001) in orthographic layout Really long sentences !!!sub-token id
.01
suffix is subtoken number within a<w>
Other elements with id
for each type use a counter within document and unique (two letter and order) suffix
.ne001