korpling / pepperModules-PTBModules

This project provides an im- and an exporter to support the Penn Treebank Format (PTB) for the linguistic converter framework Pepper (see https://u.hu-berlin.de/saltnpepper).
Other
0 stars 0 forks source link

[Exporter] Token order is lost in export #4

Closed MartinKl closed 4 years ago

MartinKl commented 4 years ago

The token order in the exported file does not correspond with the token order in the imported file (in this case: Exmaralda, an error in the EXMARaLDAImporter can be excluded, since the token order is correct in the result of the ANNISExporter).

The workflow:

<?xml version="1.0" encoding="UTF-8" ?>
<pepper-job id="exb2treetagger" version="1.0">
    <importer name="EXMARaLDAImporter" path="./exb/">
        <customization>
            <property key="pepper.after.renameAnnos">norm::cu:=cu;norm::pos_lang:=pos_lang</property>
            <property key="mapTimeline">false</property>
            <property key="salt.tokenization">norm</property>
        </customization>
    </importer>
    <manipulator name="Hierarchizer">
        <customization>         
            <property key="hierarchy.layer.name">ptb</property>
            <property key="hierarchy.names">cu,pos_lang</property>
            <property key="hierarchy.default.values">cu:=CU</property>
        </customization>
    </manipulator>
    <exporter name="PTBExporter" path="./ptb/">
        <customization>
            <property key="ptb.Exporter.importRelationAnnos">false</property>
        </customization>
    </exporter>
    <exporter name="ANNISExporter" path="./annis/">
        <customization>
            <property key="clobber.visualisation">false</property>
        </customization>
    </exporter> 
</pepper-job>

Input and results attached. data.zip

MartinKl commented 4 years ago

This turned out not to be an issue of PTBExporter, but Hierarchizer in ModuleBox who lost token order when building the overarching SStructure objects.