korpling / pepperModules-TreetaggerModules

This project provides an im- and an exporter to support the TreeTagger format in the linguistic converter framework Pepper (see http://corpus-tools.org/pepper/). The TreeTagger is a natural language processing tool, to annotate text with part-of-speech and lemma annotations. A detailed description of the importer can be found in section TreeTaggerImporter and a description for the exporter can be found TreeTaggerExporter.
Other
0 stars 1 forks source link

Cannot import more than 3 columns in TT importer #3

Closed amir-zeldes closed 7 years ago

amir-zeldes commented 8 years ago

The TT Importer seems to be ignoring columns after the 3rd in the input, although the exporter supports the 'anyAnnotation' option. This seems to be caused by the handling of maximum columns in the EMF api and not in the module itself:

https://github.com/korpling/treetagger-emf-api/blob/master/src/main/java/de/hu_berlin/german/korpling/saltnpepper/misc/treetagger/resources/TabResource.java

Is there any reason for this? The user should be allowed to import and specify names for an arbitrary amount of columns.

FlorianZipser commented 8 years ago

When I got it right, the loading of treetagger data should be able to import more than just two columns. The function addDataRow(String) contains the following:

else { anno = TreetaggerFactory.eINSTANCE.createAnyAnnotation(); anno.setName(columnName); token.getAnnotations().add(anno); }

which looks like as if it also reads further columns. The function is called in load(Map) contained in class TabResources or in TabReader in #7. I haven't checked whether it is working or not. But maybe this is a good point to start from for debugging.

FlorianZipser commented 7 years ago

solved with #7