pybliometrics-dev / pybliometrics

Python-based API-Wrapper to access Scopus
https://pybliometrics.readthedocs.io/en/stable/
Other
410 stars 128 forks source link

Bug: KeyError with auid for collaboration #242

Closed astrochun closed 2 years ago

astrochun commented 2 years ago

Bug report? Please state your pybliometrics version and a complete code snippet to reproduce the bug.

Version: 3.2.0 (latest available on PyPi)

AbstractRetrieval fails with publications that include collaboration as @auid is not set when accessing the authorgroup property.

Traceback:

In [1]: from pybliometrics.scopus import AbstractRetrieval

In [2]: sc_abs = AbstractRetrieval('10.1038/s41586-021-04023-y', id_type='doi', view='FULL')

In [3]: sc_abs.authorgroup
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-efbfec392583> in <module>
----> 1 sc_abs.authorgroup

~/codes/project_code/venv/lib/python3.8/site-packages/pybliometrics/scopus/abstract_retrieval.py in authorgroup(self)
     83                            postalcode=aff.get('postal-code'),
     84                            addresspart=aff.get('address-part'),
---> 85                            country=aff.get('country'), auid=int(au['@auid']),
     86                            orcid=au.get('@orcid'),
     87                            surname=au.get('ce:surname'), given_name=given,

KeyError: '@auid'

The collaboration metadata from the Scopus response is:

{
  "collaboration": {
    "@seq": "448",
    "@collaboration-instance-id": "2013922629-e85ca848f90f8c3a1bb1e4d3d24ce59d",
    "ce:text": "the W7-X Team",
    "ce:indexed-name": "the W7-X Team"
   }
}
Michael-E-Rose commented 2 years ago

I've been looking for these collaborations for a while - glad you found them. I continue the discussion in #243 .

astrochun commented 2 years ago

Understand that this is a difficult fix, but I pulled your latest changes and tried with the above example and it did not work:

from pybliometrics.scopus import AbstractRetrieval
sc_abs = AbstractRetrieval('10.1038/s41586-021-04023-y', id_type='doi', view='FULL')
sc_abs.authorgroup
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-efbfec392583> in <module>
----> 1 sc_abs.authorgroup

~/codes/PPPL/pybliometrics/pybliometrics/scopus/abstract_retrieval.py in authorgroup(self)
     73         try:
     74             collab_idx = keys.index("collaboration")
---> 75             collaboration = items.pop(collab_idx)['collaboration']
     76         except ValueError:
     77             collaboration = {'ce:indexed-name': None}

IndexError: pop index out of range

I'm using 43413a9 of master

Michael-E-Rose commented 2 years ago

Indeed, my approach didn't work. What's different here is that authors still do have an affiliation.

astrochun commented 2 years ago

Thanks. Can confirm that the latest commit is a cleaner solution (works). When is the next PyPI release?

astrochun commented 2 years ago

Question: Has this issue been resolved? I ask because I attempting to use pybliometrics on the same DOI above and the collaboration field is simply None.

Michael-E-Rose commented 2 years ago

For me not:

>>> from pybliometrics.scopus import AbstractRetrieval
>>> sc_abs = AbstractRetrieval('10.1038/s41586-021-04023-y', id_type='doi', view='FULL')
>>> import pandas as pd
>>> df = pd.DataFrame(sc_abs.authorgroup)
>>> df.collaboration
0      the W7-X Team
1      the W7-X Team
2      the W7-X Team
3      the W7-X Team
4      the W7-X Team
           ...      
449    the W7-X Team
450    the W7-X Team
451    the W7-X Team
452    the W7-X Team
453    the W7-X Team
Name: collaboration, Length: 454, dtype: object
astrochun commented 2 years ago

For me not:

>>> from pybliometrics.scopus import AbstractRetrieval
>>> sc_abs = AbstractRetrieval('10.1038/s41586-021-04023-y', id_type='doi', view='FULL')
>>> import pandas as pd
>>> df = pd.DataFrame(sc_abs.authorgroup)
>>> df.collaboration
0      the W7-X Team
1      the W7-X Team
2      the W7-X Team
3      the W7-X Team
4      the W7-X Team
           ...      
449    the W7-X Team
450    the W7-X Team
451    the W7-X Team
452    the W7-X Team
453    the W7-X Team
Name: collaboration, Length: 454, dtype: object

Not sure what happened but I'm getting the same now.

silas-blomqvist commented 1 year ago

I presume this issue has been solved. However, i get a key error on the 'collaboration':

File ~\Anaconda3\envs\leadership_review\lib\site-packages\pybliometrics\scopus\abstract_retrieval.py:75, in AbstractRetrieval.authorgroup(self)
     73 keys = [k for x in items for k in list(x.keys())]
     74 if "collaboration" in keys:
---> 75     collaboration = items.pop(-1)['collaboration']
     76 else:
     77     collaboration = {'ce:indexed-name': None}

KeyError: 'collaboration'

I am using another package 'litstudy' which depends on pybliometrics.

Can you help me find the issue? it seems to revolve around the author group.

Michael-E-Rose commented 1 year ago

@silas-blomqvist: sorry, it seems I missed your request! If your problem is still present, please provide more information (ideally in a new issue): Code to reproduce as well as the pybliometrics version you're using.