data for pretraining SPBERT

aosokin commented 2 years ago

Hi, thanks for releasing the code for your method and the weights for your models!

While reading your paper I got very interested in the data you used to pre-train SPBERT. The paper says the following: "To prepare a large-scale pre-training corpus, we leverage SPARQL queries from end-users, massive and highly diverse structures. These query logs can be obtained from the DBpedia endpoint powered by a Virtuoso instance. We only focus on valid DBpedia query logs spans from October 2015 to April 2016."

Could you please explain in more detail how to get this data? Would it be possible for you to release the exact corpus used for pre-training?

Best, Anton

JiexingQi commented 2 years ago

Hi, @aosokin did you have found it? I have the same question.

aosokin commented 2 years ago

@JiexingQi Hi, no, never got the data

heraclex12 / NLP2SPARQL

data for pretraining SPBERT #2