korpling / pepperModules-sgsTEIModules

Provides modules to import and export the TEI subset used in the sgs corpus
0 stars 0 forks source link

Things to discuss #3

Closed MartinKl closed 6 years ago

MartinKl commented 6 years ago
MartinKl commented 6 years ago
MartinKl commented 6 years ago
MartinKl commented 6 years ago
MartinKl commented 6 years ago

refex-001 à propos du refex-001 cadavre refex-001 qu' refex-001 on refex-001 a refex-001 retrouvé refex-001 au refex-001 quatrième refex-001 étage

This is a perfect example of why a minimal segmentation could be irritating. It might be useful looking at people's search habits and sticking to how things are usually done in ANNIS, but if we had a minimal segmentation, that splits à propos du in à, propos and du, we still would not be able to get the information from the data, that the referring expression span should only cover du (which is btw still "too much").

See also #1

MartinKl commented 6 years ago

related to #2 it is important to know, whether the markable annotations are ALWAYS provided in linear order, e. g. if the multiple-subtoken-token's ("à propos du") annotations ("à propos de" (...) and "le" (det)) are also mentioned in the linear order (for the example first the a-propos-de annotation and then the det annotation). If this is not guaranteed, there might be some difficulties