google / sling

SLING - A natural language frame semantics parser
Apache License 2.0
1.93k stars 268 forks source link

Training sling in Korean | How do I make a `.rec` file? #443

Closed gyunggyung closed 4 years ago

gyunggyung commented 4 years ago

Hello, I'd like to training sling in Korean.

There is also a learning file related to Korean.

There's a paper that explains it' https://github.com/emorynlp/ud-korean/blob/master/doc/kaist.pdf

This is a dataset. https://github.com/UniversalDependencies/UD_Korean-Kaist

But there's a problem. I remember that when I studied syntaxnet, the file extension for both Korean and English was .conflu.

I think sling uses a file format called .rec, which I've never seen before.

So I don't know what to do to make the.conflu file into.rec.

If you've succeeded in creating a .rec file, can I just use the command below to training code?

./sling/nlp/parser/tools/train.sh --train=../UD_Korean-Kaist/ko_kaist-ud-train.rec --dev=../UD_Korean-Kaist/ko_kaist-ud-dev.rec --report_every=50000 --train_steps=100000 --output=test
ringgaard commented 4 years ago

The parser trainer expects a record file where each record is in SLING document frame format. We have a converter from OntoNotes to SLING format here:

https://github.com/google/sling/tree/master/sling/nlp/parser/ontonotes

You can also take a look at a CoNLL-to-SLING converter I have made:

https://github.com/ringgaard/sling/blob/dev/sling/nlp/parser/tools/conll-to-sling.py

Please notice that the SLING parser is not a dependency parser. It is a frame semantics parser, so it is more similar to semantic role labeling.

gyunggyung commented 4 years ago

Thank you for your answer :)

However, I would like you to explain the code so that I can proceed smoothly. It would be better to have a readme file.

It is a document about English that I found. Screenshot from 2020-06-10 04-49-17

https://github.com/UniversalDependencies/UD_Korean-Kaist

This is a Korean-made structure as mentioned in the passage above. Screenshot from 2020-06-10 04-50-21

There seems to be a structural difference, so I hope there is a document that tells me how to change the structure of the '.conflu' file.

Or I would like you to briefly tell me in the comments here.

Thank you so much for answering my issue. Once again, please help me solve this problem.

ringgaard commented 4 years ago

It looks like your Korean data is a dependency treebank which can be used for syntactic parsing. The CoNLL-2012 from OntoNotes has dependency annotations as well, but it also contains semantic role labeling annotations from PropBank. SLING is a semantic parser but you cannot use it for dependency parsing.

gyunggyung commented 4 years ago

Can't training with sling, thank you for your answer :)