pybliometrics-dev / pybliometrics

Python-based API-Wrapper to access Scopus
https://pybliometrics.readthedocs.io/en/stable/
Other
407 stars 127 forks source link

`publicationyear` of the references returned by `AbstractRetrieval` REF not parsed #254

Closed lewisjiang closed 2 years ago

lewisjiang commented 2 years ago

pybliometrics version: 3.3.0

Code to reproduce the bug:

from pybliometrics.scopus import AbstractRetrieval
ab = AbstractRetrieval("10.1002/rob.21762", view='REF', refresh=True)
print(ab.references[2])

Note: The publicationyear is None for this well indexed reference entry.

Expected behavior:

import json
import requests
url = "https://api.elsevier.com/content/abstract/doi/10.1002/rob.21762"

headers = {"X-ELS-APIKey": "YOURKEY",}
params = {
    "httpAccept": "application/json",
    "view": "REF",
    "startref": 0,
    "refcount": 10
}
req = requests.get(url=url, headers=headers, params=params)
entry = json.loads(req.text)["abstracts-retrieval-response"]["references"]["reference"][2]
print(json.dumps(entry, indent=4))

By manually requesting the Scopus API, we can see the only entry providing information about time is "prism:coverDate": "2016-12-01",. And pybliometrics cannot resolve the entry with

https://github.com/pybliometrics-dev/pybliometrics/blob/d31a185e1a021c4a70651addb2daebb4b7724f23/pybliometrics/scopus/abstract_retrieval.py#L536

I tested this with a dozen of publications with DOI, and all of their references (200+) have None in the publicationyear field. I suspect the data structure of Scopus API response has changed.

Michael-E-Rose commented 2 years ago

Thanks for finding this out! Apparently they introduced a new field coverDate that is only available via the REF view.