Open kaplun opened 8 years ago
Hi @kaplun, thanks for your email. Having an interface to help correcting training data is, without doubt, a very nice to have feature. Although I think that with or without an interface, correcting training data is a difficult, boring and frustrating job anyway :) On the other hand I have to say that with the generation of pre-labelled training data, GROBID saves already a lot of work. We have seen that almost anybody with some knowledge of the training guidelines could easily and successfully correct training data.
To answer your question, we have thought about it, but given the time available (nobody is full-time on this project) and the amount of stuff we have on the plate, we have no plans at the moment. However GROBID is an opensource project and we would be happy to include additional features from external contributors. ;-)
Cheers Luca
Hi @lfoppiano. Thanks for the confirmation. I fully understand. I was indeed checking just in case, because, since you confirm nobody is already actively working on this, we might plan to contribute such an interface, in the future.
@kaplun cool stuff. :-) How advanced are you on this? I'm asking because we had already gave some thoughs on the subject. We could exchange ideas and solutions.
Completely blank slate t, :) Just considering such functionality as one future project. We haven't even yet allocated resources to it :)
What I can tell you is that if we were to implement it, we would implement a generic indipendent component, using Angular 2+Bootstrap for front end and Python-Flask for the backend.
cc: @jmartinm
@kaplun I forgot to mention, you might want to have a look at https://github.com/Vi-dot/grobid-smecta ;-) At the moment is tailored to handle astronomical training data (https://github.com/kermitt2/grobid-astro) but looks promising. This could be a common point of discussion in Berlin :-)
Currently, for prospect training data producers there is an entrance barrier due to the need of manually editing the TEI XML produced by Grobid in order to introduce corrections that can be later feed back into the training.
It would be great to have a web interface similar to Google Structured Data Markup Helper that would greatly reduce the effort of producing training data.
Was such interface already considered or planned?