Closed balhoff closed 9 months ago
Hey @balhoff,
If you want to download the complete dataset for a specific release, you need ALL the different files for each named graph.
For the current SemOpenAlex version 4.0.0 this are 141 files for semopenalex/authors. You should merge all of these.
FYI: We follow the structure of the OpenAlex Data Snapshot for the provided folder structure.
At the moment we also provide the two previous SemOpenAlex Data Dumps from 2022-11-21/ and 2023-04-24/. However, if you only want to download the latest version, you can ignore these folders and only download all .trig.gz files of the following named graphs:
authors/ (141 files for version 4.0.0)
concepts/ (1 file for version 4.0.0)
funders/ (1 file for version 4.0.0)
institutions/ (1 file for version 4.0.0)
publishers/ (1 file for version 4.0.0)
sources/ (1 file for version 4.0.0)
works/ (470 files for version 4.0.0)
Thanks for the info @davidlamprecht.
You should merge all of these.
A small correction: you could concat them, but I think it's faster to just load each in a semantic repo. For GraphDB, the easiest is to import them as server files in Workbench.
If I want to download the complete dataset for the current release, do I need all the different files for each named graph, or just the latest one? For example under semopenalex / authors, there are files like:
Should I merge all of these? Thanks!