proycon / foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
GNU General Public License v3.0
10 stars 4 forks source link

[folia2salt] Question: are List annotations supported yet? #45

Open pirolen opened 2 years ago

pirolen commented 2 years ago

In LaMachine I tried out folia2salt, but I got:

Exception: Unable to init layer for element <ListItem at 140407838899560 id=FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1 set=None class=None>

I wonder if List annotations are supported.

proycon commented 2 years ago

The folia2salt implementation is a more like of first proof-of-concept at this stage, so expect things not to be implemented yes. I don't think anybody has seriously used it yet. You can follow the status in this issue: https://github.com/proycon/folia/issues/85 .. I'm not currently working on that now though as I don't think there's anybody currently interested in it anymore.

pirolen commented 2 years ago

I'd then rather try find a workaround, using another FoLiA converter.

proycon commented 2 years ago

What's the aim you're trying to achieve?

pirolen commented 2 years ago

I am investigating options for a future workflow, and checking compatibility between FoLiA and INCEpTION (https://inception-project.github.io/documentation/) that imports/exports using formats like Weblicht TCF and UIMA CAS. The use case at hand seems to require very meticulous and repetitive manual annotations, basically on each token, just like POS tagging, but with domain-specific entities.

I am looking at ways how to set this up in LaMachine/FLAT and/or INCEpTION/cassis (https://github.com/dkpro/dkpro-cassis). The INCEpTION tool would be handy for doing active learning of entities, suggesting new entity annotations on the fly. Another handy functionality is the entity linking to knowledge bases, which this project would need at some point. I suspect it is more sustainable for me to stay in LaMachine/FLAT/foliapy and see if I can provide the users with similar functionalities, even if not on th efly, but e.g. by regularly generating gazetteers and NER taggers and pre-processing new documents using them. What do you think?

By a brief test anyway, it seems that the folia2html output would likely encode sufficient information to feed in to INCEpTION. About converting its export to FoLiA I'll write separately. I tried TEI and UIMA CAS so far...

Any suggestions are most welcome, also in separate threads. Thank you very much!