korpling / pepperModules-MergingModule

This project provides a Pepper module for the merging of data on several possible levels.
Other
2 stars 2 forks source link

Merger not working #7

Closed amir-zeldes closed 8 years ago

amir-zeldes commented 8 years ago

I have the same documents in two formats: TreeTagger and conll. Both formats convert correctly to PAULA individually. However, using both sources and the merger module, I get the following error, which looks like the output directory cannot be created (although in fact it is created and remains empty):

+----------------------------------- step 1 -----------------------------------+
|importer:      CoNLLImporter                                                  |
|path:          file:/C:/Pepper4_new/corpora/_sandbox/dep/GUM                  |
|corpus index:  0                                                              |
|properties:                                                                   |
|               conll.SLEMMA:            LEMMA                                 |
|               conll.SPOS:              POSTAG                                |
|               conll.considerProjectivity:true                                  |
|               conll.projectiveMode:    TYPE                                  |
|               conll.splitFeatures:     true                                  |
|               pepper.after.addSLayer:  dep                                   |
|               pepper.after.removeAnnos:cat                                   |
|               pepper.after.renameAnnos:deprel:=func                          |
|               pepper.after.reportCorpusGraph:false                                 |
|               pepper.after.tokenize:   false                                 |
|                                                                              |
+----------------------------------- step 2 -----------------------------------+
|importer:      TreetaggerImporter                                             |
|path:          file:/C:/Pepper4_new/corpora/_sandbox/xml/GUM                  |
|corpus index:  1                                                              |
|properties:                                                                   |
|               pepper.after.addSLayer:  tei                                   |
|               pepper.after.reportCorpusGraph:false                                 |
|               pepper.after.tokenize:   false                                 |
|               treetagger.input.annotateAllSpansWithSpanName:true                                  |
|               treetagger.input.annotateUnannotatedSpans:true                                  |
|               treetagger.input.metaTag:text                                  |
|               treetagger.input.separatorAfterToken:                                      |
|                                                                              |
+----------------------------------- step 3 -----------------------------------+
|manipulator:   Merger                                                         |
|path:          null                                                           |
|properties:                                                                   |
|               copyNodes:               true                                  |
|               escapeMapping:           " ": "", "     ": "", "        ": "", "
": "", "ä": "ae", "ö": "oe", "ü": "ue", "ß": "ss", "Ä": "Ae", "Ö": "Oe", "Ü": "Ue", |
|               firstAsBase:             true                                  |
|               pepper.after.reportCorpusGraph:false                                 |
|               pepper.after.tokenize:   false                                 |
|               punctuations:            '.',',',':',';','!','?','(',')','{','}','<','>'|
|                                                                              |
+----------------------------------- step 4 -----------------------------------+
|exporter:      PAULAExporter                                                  |
|path:          file:/C:/Pepper4_new/corpora/_sandbox/_out                     |
|properties:                                                                   |
|               emptyNamespace:          no_layer                              |
|               humanReadable:           true                                  |
|               pepper.after.reportCorpusGraph:false                                 |
|               pepper.after.tokenize:   false                                 |
|                                                                              |
+------------------------------------------------------------------------------+

using meta tag 'text'
using input file encoding 'UTF-8'
CONVERSION ENDED WITH ERRORS, REQUIRED TIME: 00:00:01.016 s
Cannot create directory C:\Pepper4_new\corpora\_sandbox\_out\GUM (PepperModuleException)
full stack trace:
org.corpus_tools.pepper.modules.exceptions.PepperModuleException: Cannot create directory C:\Pepper4_new\corpora\_sandbox\_out\GUM
        at org.corpus_tools.peppermodules.paula.PAULAExporter.mapCorpusStructure(PAULAExporter.java:108)
        at org.corpus_tools.peppermodules.paula.PAULAExporter.exportCorpusStructure(PAULAExporter.java:71)
        at org.corpus_tools.pepper.impl.PepperExporterImpl.start(PepperExporterImpl.java:119)
        at org.corpus_tools.pepper.core.ModuleControllerImpl$2.run(ModuleControllerImpl.java:272)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

Here is my pepperparams (the data can be found in the GUM repo, I'm using the 'ants' document from the folders dep/ (conll format) and xml/ (tt format, I renamed the files to .tt). The OS is Win10 64. Each of the import conversions runs fine if I comment out the other, as well as the merging module.

<?xml version="1.0" encoding="UTF-8"?>
<pepper>
    <importer name="CoNLLImporter" path="file:/C:/Pepper4_new/corpora/_sandbox/dep/GUM/">
        <customization>
            <property key="pepper.after.addSLayer">dep</property>
            <property key="pepper.after.removeAnnos">cat</property>
            <property key="pepper.after.renameAnnos">deprel:=func</property>
        </customization>
    </importer>
    <importer name="TreetaggerImporter" path="file:/C:/Pepper4_new/corpora/_sandbox/xml/GUM/">
        <customization>
            <property key="treetagger.input.metaTag">text</property>
            <property key="treetagger.input.annotateAllSpansWithSpanName">true</property>
            <property key="treetagger.input.annotateUnannotatedSpans">true</property>
            <property key="pepper.after.addSLayer">tei</property>
        </customization>
    </importer>
    <manipulator name="Merger">
        <property key="copyNodes">true</property>
        <property key="firstAsBase">true</property>
    </manipulator>
    <exporter name="PAULAExporter" path="file:/C:/Pepper4_new/corpora/_sandbox/_out">
    </exporter>
</pepper>
amir-zeldes commented 8 years ago

OK, I think the problem is not the merging module - exporting to relANNIS does work, so this is possibly a problem with the PAULA module. I'm moving this issue to the PAULA repo.