Closed ewan-escience closed 1 month ago
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
Failed conditions
C Reliability Rating on New Code (required ≥ A)
See analysis details on SonarCloud
Catch issues before they fail your Quality Gate with our IDE extension SonarLint
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
Works as expected.
One question: given that we can get most of the information for https://openalex.org/W3159002838 after scraping, couldn't we also use this identifier to import the mention in the first place? The "Search for DOI or title" box could add the OpenAlexID?
One question: given that we can get most of the information for https://openalex.org/W3159002838 after scraping, couldn't we also use this identifier to import the mention in the first place? The "Search for DOI or title" box could add the OpenAlexID?
Yes, that's what I meant with the second TODO in the PR description. 🙂 I will open issues for the TODOs.
Scrape citations from OpenAlex reference papers
Changes proposed in this pull request
external_id
column of themention
table and add theopenalex_id
column (note: existing data will need to be migrated)Doi
andOpenalexId
classes)How to test
docker compose down --volumes && docker compose build --parallel && docker compose up --scale data-generation=0
https://openalex.org/W3159002838
docker compose exec scrapers java -cp /usr/myjava/scrapers.jar nl.esciencecenter.rsd.scraper.doi.MainCitations
10.1016/j.future.2018.08.004
docker compose down --volumes && docker compose up --scale data-generation=1
docker compose exec scrapers java -cp /usr/myjava/scrapers.jar nl.esciencecenter.rsd.scraper.doi.MainMention
docker compose exec scrapers java -cp /usr/myjava/scrapers.jar nl.esciencecenter.rsd.scraper.doi.MainCitations
docker compose exec scrapers java -cp /usr/myjava/scrapers.jar nl.esciencecenter.rsd.scraper.doi.MainReleases
To do
Migration:
Before dropping the
external_id
column and after adding theopenalex_id
column, the following (untested) query should be executed:The following was tested in production, yielding a result of
5629
The following gave the same result of
5629
:To check for unique entries, run
which again yielded
5629
.If you do have duplicate entries, you can get them with:
Closes #1291
PR Checklist:
docker-compose.yml