tdozat / Parser-v2

An updated version of the Parser-v1 repo, used for Stanford's submission in the CoNLL17 shared task.
47 stars 34 forks source link

Input type #8

Open xsway opened 6 years ago

xsway commented 6 years ago

Hi!

I was considering using your parser to parse some wiki corpora. A quick question: what is the type of input for pre-trained models? Is it possible to give to your parser raw text and get the whole pipeline (tokenization, tagging, parsing) running, or do you require the pre-processed conll-style input with POS tags?

Thanks!

msklvsk commented 6 years ago

You have to pre-spit into sentences and tokens with UDPipe. That's what Stanford did for this parser:

screen shot 2017-11-18 at 13 27 38

That is, the input should be CoNLL-U-formatted. This parser/tagger will fill the corresponding columns in CoNLL-U.

Vimos commented 5 years ago

This better goes to the Readme.