ufal / udpipe

UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files
Mozilla Public License 2.0
358 stars 75 forks source link

congratulations on udpipe-future #74

Closed jwijffels closed 1 year ago

jwijffels commented 6 years ago

My congratulations on your results with UDPipe-Future! I'm just putting this issue here to see on what are your plans (if any) to convert the UDPipe-Future Python code in C++ format to incorporate it into ufal/udpipe. This all with a number of elements in mind

I would be happy to test out if needed.

foxik commented 6 years ago

Thanks!

I am on vacation, will respond in more detail in ~2-3 weeks. In short, we would like to release UDPipe 2.0 (based on UDPipe-Future) with UD 2.3, i.e., 15 Nov. It will be still C++, but it will require one dynamic library (TensorFlow). The distribution of the library is yet unknown (i.e., if we just use the official URLs, or mirror selected versions, or even compile the versions by ourselves). In any case, the library will be used only for the models which require them, so the current models will continue to work without any dependencies. The URL will stay the same.

jwijffels commented 6 years ago

thanks for the feedback. enjoy your holidays!

msklvsk commented 5 years ago

I just want to acknowledge how usable UDPipe is compared to other shared task winners. Stanford and Harbin aren’t production-ready. You have to tokenize with UDPipe first, then pipe CONLL-U to their tagger, then pipe again through their parser, and then pipe back to UDPipe for FEATS and lemmas. They have no http server built in, and (even with a custom server) they spend seconds to initialize for each chunk you feed them. Not to mention they consume twice the RAM because models don’t share embeddings. If UDPipe-Future will keep the same interface and in-process pipeline — that would be a huge progress for the community.

foxik commented 5 years ago

Thanks for your praise :-) Yes, UDPipe-Future prototype will be integrated in UDPipe 2.0, but "from the outside" little things will be changed (i.e., technical issues with using GPUs). Eventually I will have to improve the API, because in the UDPipe-Future we can do tagging, lemmatization & parsing jointly, which part of the current API does not expect (we have separate methods tag and parse), and the API does not support batching (because it processes sentences).

danielhers commented 5 years ago

Hi @foxik, just wondering if there is an updated estimate on the UDPipe 2.0 release date. Thanks!

foxik commented 5 years ago

The current estimate is still several weeks. We have the models ready, but bundling Tensorflow as a backend requires solving several technical issues, and I am doing a lot of teaching this semester.

jwijffels commented 5 years ago

Looking forward to it - this is a great course by the way: https://github.com/ufal/npfl117 many thanks also for sharing this!

foxik commented 5 years ago

Thanks! If you like it, you can look at http://ufal.mff.cuni.cz/courses/npfl114 (and the Github repository of it), there we even have recordings of the lectures and many assignments :-)

jwijffels commented 5 years ago

@foxik Many thanks for the links.

msklvsk commented 5 years ago

Quick question: As written in the shared task paper, UDPipe 2.0 will have the same tokenization/segmentation as 1.2? No point to postpone tokenizing-deduping a huge corpus?

foxik commented 5 years ago

The UDPipe 2.0 models for UD 2.3 will have the same tokenization/segmentation as the UD 2.3 models for UDPipe 1.2 released at https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2898. So you can use that now and wait for the better models :-)

BTW, in the planned UDPipe 3.0 (yes, I know 2.0 is not out yet) we will have better segmentation/tokenization modules, but we are talking about May (UD 2.4 release).

jowagner commented 5 years ago

Link to the prototype: https://github.com/CoNLL-UD-2018/UDPipe-Future

jwijffels commented 5 years ago

Hi @foxik, I have some time in the coming 2 months to integrate possible changes in the R interface. Any plans for the UDPipe 2.0 release which I can expect in the coming weeks?

foxik commented 5 years ago

Hi, unfortunately, my plans with UDPipe 2.0 are still unrealized. A combination of health and a lot of teaching & students have taken all my time. I am still trying to do it as soon as possible, but now in the time of conferences and shared tasks I cannot really tell when I will get to it. Sorry.

jwijffels commented 5 years ago

Hi @foxik. Good to hear that you are still planning to incorporate it - either sooner or later whichever suits you best!

jwijffels commented 5 years ago

A link for UDPipe 2.1 ;)? https://128.84.21.199/pdf/1904.02099.pdf

foxik commented 5 years ago

:-) That is written by a student of mine... (but all results are his own, I am just watching from the distance what is happening :-)

jowagner commented 5 years ago

With domain name: https://arxiv.org/abs/1904.02099 and note that an update with proper literature review has been announced: https://twitter.com/hyperparticle/status/1113845007901814785

jwijffels commented 5 years ago

Really nice work you did (https://arxiv.org/pdf/1908.06931.pdf) on integrating BERT, predicting each morphological feature & on merging corpora of the same language! I already wondered what this BERT integration would give. Thanks!

foxik commented 5 years ago

Thanks! I will be submitting another paper today, this time evaluating BERT also on dependency parsing :-)

jwijffels commented 5 years ago

Looking forward to see that one also appearing. Where will we be able to find this?

foxik commented 5 years ago

arXiv again. It got rejected from EMNLP, but I think it needs to be published soon.

jwijffels commented 5 years ago

Found it https://arxiv.org/pdf/1908.07448.pdf. These are tremendously impressive results :)

EmilStenstrom commented 4 years ago

Is there any way for UDPipe users to help out or sponsor continued work on UDPipe?

foxik commented 4 years ago

Sorry, I am spread thin by by taking a lot of other responsibilities (since I want to stay at the university, I took up teaching and master and phd students).

The UDPipe is however still among my responsibilities and being worked on -- the problem is that we are rewriting nearly everything to use C++ TensorFlow library as a backend, and it takes some time.

I see no easy way of contributing until we have the basic infrastructure in place, sorry...

As a sidenotes, the UDPipe 2 is expected to support arbitrary NLP operations (not only tokenizing, tagging and parsing), and we have NER, NE linking, GEC and semantic parsing prototypes :-)

EmilStenstrom commented 4 years ago

Supporting NER, NE linking sounds fantastic :) I'm so looking forward to using the new version in my (work and hobby) projects!

ArijRB commented 3 years ago

Hey, Thank you for sharing your contributions. Any update about on the UDPipe 2.0 release date. Thanks! @foxik

foxik commented 3 years ago

As mentioned at http://ufal.mff.cuni.cz/udpipe/2 , the UDPipe 2 models are already available in our REST server; the release of the models themselves and the UDPipe 2 should happen in the following weeks.

As http://ufal.mff.cuni.cz/udpipe describes, UDPipe 2 is just a Python prototype (to get the models to the public); the C++ version has now been designated as UDPipe 3 and it is ... (surprise, surprise) ... still being worked on.

foxik commented 1 year ago

Closing, as future development will move from UDPipe to LinPipe.

jowagner commented 1 year ago

https://github.com/ufal/linpipe