pybliometrics-dev / pybliometrics

Python-based API-Wrapper to access Scopus
https://pybliometrics.readthedocs.io/en/stable/
Other
410 stars 128 forks source link

Affiliation city and country in ScopusSearch.results not correct #232

Closed IvanSterligov closed 2 years ago

IvanSterligov commented 2 years ago

Hello and many thanks for your work!

I've encountered a problem with ScopusSearch module.

It seems that when exporting affil_city and affil_country it takes the wrong values and thus presents incorrect information.

See for example eid=2-s2.0-84986260127: its a paper by a group of Chinese authors from Beijing, and it is correctly attributed to China in Scopus web interface.

When exporting via ScopusSearch, we get instead United States in affil_country and Redmond instead of Bejing in affil_city. How come?

This is because pybliometrics wrapper takes the wrong part of json response, namely, not the real countries stated in the paper by the authors, but the "official" address of the org profile. At least for Computer Science this leads to big problems because of the wide (and widening) geographic distribution of r&d centers of corporations. For example, in my dataset of papers from top CS conferences there are 120 Microsoft Research papers attributed by ScopusSearch solely to USA, but when looking at them in Scopus web interface we see that only 88 actually have US affils, and amongst others are not only Beiging, but 9 papers from MSR Bangalore, 3 from MSR Montreal etc.

This is because ScopusSeach wraps the wrong json part. For the aforementioned paper it should wrap this:

"bibrecord": {
        "head": {
          "author-group": {
            "affiliation": {
              "country": "China",
              "@afid": "60021726",
              "@country": "chn",
              "city": "Beijing",
              "organization": {
                "$": "Microsoft Research"
              },
              "affiliation-id": {
                "@afid": "60021726"
              }
            },

and not this:

     "affiliation": {
      "affiliation-city": "Redmond",
      "@id": "60021726",
      "affilname": "Microsoft Research",
      "@href": "https://api.elsevier.com/content/affiliation/affiliation_id/60021726",
      "affiliation-country": "United States"

It would be nice to see this issue corrected.

Best regards, Ivan

Michael-E-Rose commented 2 years ago

I don't quite understand the problem. First, you must refer to the AbstractRetrieval() class, don't you? Because ScopusSearch() possess only one attribute, and its json does not have the bibrecord section.

For AbstractRetrieval() though, the information you refer to is present (again assuming you mean different properties):

>>> ab = AbstractRetrieval('2-s2.0-84986260127', view='FULL')
>>> ab.affiliation
[Affiliation(id=60021726, name='Microsoft Research', city='Redmond', country='United States')]
>>> ab.authorgroup
[Author(affiliation_id=60021726, dptid=None, organization='Microsoft Research', city='Beijing', postalcode=None, addresspart=None, country='China', auid=57191074239, orcid=None, indexed_name='Xu J.', surname='Xu', given_name='Jun'),
 Author(affiliation_id=60021726, dptid=None, organization='Microsoft Research', city='Beijing', postalcode=None, addresspart=None, country='China', auid=7005396035, orcid=None, indexed_name='Mei T.', surname='Mei', given_name='Tao'),
 Author(affiliation_id=60021726, dptid=None, organization='Microsoft Research', city='Beijing', postalcode=None, addresspart=None, country='China', auid=36456529400, orcid=None, indexed_name='Yao T.', surname='Yao', given_name='Ting'),
 Author(affiliation_id=60021726, dptid=None, organization='Microsoft Research', city='Beijing', postalcode=None, addresspart=None, country='China', auid=7006494583, orcid=None, indexed_name='Rui Y.', surname='Rui', given_name='Yong')]

While the affiliation is listed as belonging to Redmond in the US, each author is listed as belonging to Bejing.

IvanSterligov commented 2 years ago

ah yes, i've messed this a bit, sorry. I am speaking about ScopusSearch, but indeed I've quoted metadata from Abstract Retrieval.

anyway, I'm speaking about Pybliometrics ScopusSearch. The problem is that for this and other similar papers it provides wrong country and city.

And this indeed is the problem of ScopusSeach, not Pybliometrics. All we have is this, and this is incorrect. Alas...

"affiliation": [ { "@_fa": "true", "affilname": "Microsoft Research", "affiliation-city": "Redmond", "affiliation-country": "United States" } ],

I'm afraid this means we cannot trust ScopusSearch for city and country data and nothing can be done atm. So I will have to query each paper via AbstractRetrieval.

Michael-E-Rose commented 2 years ago

Yes, I'm afraid that's the only short-term solution. Sometimes Scopus creates department IDs for parts of institutions (such as hospitals), who then get their own address information. But for Microsoft they didn't do this. Maybe complaining helps...