studio-ousia / luke

LUKE -- Language Understanding with Knowledge-based Embeddings
Apache License 2.0
705 stars 101 forks source link

Fix language param for while-list-only vocab build #164

Closed MrZilinXiao closed 2 years ago

MrZilinXiao commented 2 years ago

If not setting language explicitly, all lang fields of entities in entity_vocab.jsonl will be left null, leading to https://github.com/studio-ousia/luke/blob/a40c580c5f1ad2f189dd02d195002921f6a4c994/luke/pretraining/dataset.py#L366 triggered exceptions since https://github.com/studio-ousia/luke/blob/a40c580c5f1ad2f189dd02d195002921f6a4c994/luke/pretraining/dataset.py#L317 gets entity_id by DumpDB.lang.

ryokan0123 commented 2 years ago

Thank you for the pull request!

This is my fault, caused by a recent change in the library. We intend that the commands work without the --language option in monolingual settings so we have fixed the code accordingly https://github.com/studio-ousia/luke/pull/165.

Please pull the code from the latest master branch and run the current command. Thanks again for your report.