Closed lshowway closed 2 years ago
Hi @lshowway,
The pretraining corpus is created using a Wikipedia dump which can be downloaded here. The dump file is preprocessed using the build_wikipedia_pretraining_dataset
command. The dump_db_file
can be built using build-dump-db
command of Wikipedia2Vec.
Since the NTEE model was also trained using a Wikipedia dump, the pretraining corpus of NTEE is the same as that of LUKE.
Thanks for your work. According to the paper, the pertaining corpus is Wikipedia with entity annotations. So, is the corpus the same as NTEE? Or, could you provide the link or something else for me to get more about this corpus?
Thanks.