webanno / webanno

🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The official WebAnno repository has reached the end of the line. -- 🚀 To migrate, export your annotation projects from WebAnno, then import them into INCEpTION and just work on.
https://webanno.github.io/webanno
Apache License 2.0
244 stars 96 forks source link

Missing comments on export to CONLL-U #1282

Closed lmompela closed 5 years ago

lmompela commented 5 years ago

Describe the bug Missing comments when exporting CONLL-U file When exporting it gives me a ".conll" file and not a ".conllu" file

To Reproduce Steps to reproduce the behavior:

  1. Go to home/annotation
  2. Click on export
  3. Click on format
  4. Scroll down to CONLL-U
  5. Click on export
  6. See error

Expected behavior Expected to see comments and to have a .conllu file Screenshots If applicable, add screenshots to help explain your problem.

Please complete the following information:

Additional context Add any other context about the problem here.

reckart commented 5 years ago

What do you mean by "comments"?

ftyers commented 5 years ago

From the docs, comments are introduced with a # as the first character of the line (see '3'). They are used for storing the original text (e.g. for training tokenisers) and for storing unique sentence IDs.

reckart commented 5 years ago

We can only export data that is known in WebAnno. WebAnno knows about sentences, tokens, pos, lemma, and dependencies (and a little about morphological features) - and of course the text itself. The sentence text (text) and sentence IDs (sent_id) are exported as "comments", but there is no support for "any kind of comments". Sentence IDs can presently not be edited in WebAnno, but if they are present in an imported CoNLL-U file, they should also be present when this file is again exported as CoNLL-U.

Other types of "comments" such as paragraph or document boundary markers are presently not supported.

Which types of comments exactly do you need?

ftyers commented 5 years ago

@reckart well, basically just the original text and the sentence ID. When we select "CoNLL-U" format they are not in the exported file. Would it be possible to include them?

reckart commented 5 years ago

Ok, I see. WebAnno 3.6.0-SNAPSHOT currently uses DKPro Core 1.10.0 which indeed does not write the sentence ID and text comment. However, they will be included latest when the next DKPro Core version is released and WebAnno is upgraded to use it.

ftyers commented 5 years ago

@reckart is there an ETA on that? (just for planning purposes), e.g. two months, I'll write a script to readd the comments, two weeks and we'll probably just wait :)

reckart commented 5 years ago

If you need it urgently, I could temporarily backport the code from DKPro Core master to WebAnno master - I'm planning to do a new WebAnno 3.6.0 beta release anyway.

reckart commented 5 years ago

@ftyers https://github.com/webanno/webanno/pull/1283 - I didn't test it yet, but since the unit tests are good, I guess it should work.

ftyers commented 5 years ago

thanks @reckart ! :) ... @lmompela could you try the version @reckart has just made?

reckart commented 5 years ago

You'll have to build this from source (master branch) yourself atm - I'm still looking into including a few other changes before running a proper release process for the next 3.6.0 beta.

reckart commented 5 years ago

WebAnno 3.6.0-beta-5 is available now.