mynlp / jigg

Pipeline framework for easy natural language processing
Apache License 2.0
74 stars 20 forks source link

CI for testing behaviors of Annotators #75

Closed hiroshinoji closed 5 years ago

hiroshinoji commented 6 years ago

There are test classes for some annotators (https://github.com/mynlp/jigg/tree/master/src/test/scala/jigg/pipeline), but they are only limited, or very superficial.

One particular problem is that we cannot test the behavior of annotators that rely on external model files, including CoreNLP annotator, since Jigg does not have the models of them internally. This is also the case for annotators for non-maven softwares, including mecab, KNP, etc.

Currently, for example, when I update the version of CoreNLP in build.sbt, I don't carefully check how the behaviors of annotators change; I just see that Jigg's wrapper does not output errors when executing. This is bad.

We need more systematic test mechanism for these external softwares, perhaps with some CI tool?

fyamamoto10 commented 6 years ago

Memo

Triggers of Travis CI:

When a programing language is scala:

hiroshinoji commented 6 years ago

Now the tests are working (https://travis-ci.org/mynlp/jigg).

We want to add several other tests for performance as well as corner cases. For example, maybe some parser does not accept a very long sentence and possibly throw an error. But I think Jigg does not implement special mechanism to handle such errors.

We should first check the behaviors of the original parsers (corenlp[parse], berkeleyparser, depccg, etc.), and then create tests by defining the desirable Jigg's input and output pairs, which would absorb the errors of the original parsers.