rust-ml / nlp-discussion

15 stars 0 forks source link

Existing Work: Readers/Writers/Datatypes #11

Open sebpuetz opened 5 years ago

sebpuetz commented 5 years ago

for e.g. CONLL format(s)

rth commented 5 years ago

For parsers of NLP related formats, there are e.g.,

sebpuetz commented 5 years ago

I just released the first proper version of a crate to read and process constituency trees at https://github.com/sebpuetz/lumberjack.

The crate is still rather unpolished and I'm unsure about what the public API should be, but it supports reading the NEGRA export format, various flavours of bracketed trees, conversion from and to @danieldk's conllx format with and without encoded constituency structure. Further, a bunch of operations on the trees are possible like filtering specific non-terminals.

There is another inactive Rust crate for reading bracketed constituency trees at https://github.com/sjmielke/ptb-reader-rust.