Closed raffaem closed 2 years ago
The bug also happens with papers with one citation.
MWE:
from pybliometrics.scopus import CitationOverview
res = CitationOverview(identifier=["85100910856"], start=2020)
print(res.grandTotal)
print(res.cc)
It's not true that all papers with no citations will throw this error either.
This paper has no citations and yet it doesn't throw the error:
>>> res = CitationOverview(identifier=["28844466437"], start=2005)
>>> print(res.grandTotal)
0
>>> res.cc
[[(2005, 0), (2006, 0), (2007, 0), (2008, 0), (2009, 0), (2010, 0), (2011, 0), (2012, 0), (2013, 0), (2014, 0), (2015, 0), (2016, 0), (2017, 0), (2018, 0), (2019, 0), (2020, 0), (2021, 0)]]
The first two example work fine for me. Which pybliometrics version are you using and which OS are you running on?
I confirm the exception on the MWE of the first post:
$ python3 205_1.py
Traceback (most recent call last):
File "205_1.py", line 5, in <module>
print(res.cc)
File "/home/raffaele/.local/lib/python3.9/site-packages/pybliometrics/scopus/abstract_citation.py", line 44, in cc
cites = [int(d['$']) for d in doc['cc']]
File "/home/raffaele/.local/lib/python3.9/site-packages/pybliometrics/scopus/abstract_citation.py", line 44, in <listcomp>
cites = [int(d['$']) for d in doc['cc']]
TypeError: string indices must be integers
This is my pybliometrics version:
$ python3
Python 3.9.7 (default, Aug 30 2021, 00:00:00)
[GCC 11.2.1 20210728 (Red Hat 11.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pybliometrics
>>> print(pybliometrics.__version__)
3.0.1
and I am running on Fedora 34 Workstation.
Still no error on my side.
Could you try to locate the cache file for case 1 and post its content here, please?
Still no error on my side.
Could you try to locate the cache file for case 1 and post its content here, please?
How can I find this file?
In ~/.scopus/citation_overview/STANDARD
I have different files all named in what seems random alphanumeric strings.
Yes, these are hashed versions of the EIDs and the years used for the retrieval. This prevents many problems with filenames. The class' docs, section "Notes" tell you more.
Do the following to obtain the file name:
from hashlib import md5
from pybliometrics.scopus import CitationOverview
identifier = ["28844466437"]
citation = None
co = CitationOverview(identifier=identifier, start=2005, citation=citation)
stem = md5("_".join(identifier).encode('utf8')).hexdigest()
if citation:
stem += "-" + citation
print(stem)
so the filename is 65637bbaf0de11228e62380ee583e744
.
Here is the content of that file:
{"abstract-citations-response":{"h-index":"0","identifier-legend":{"identifier":[{"@_fa":"true","dc:identifier":"SCOPUS_ID:28844466437","prism:doi":"10.1103/PhysRevE.72.059902","pii":null,"scopus_id":"28844466437"}]},"citeInfoMatrix":{"citeInfoMatrixXML":{"citationMatrix":{"citeInfo":[{"@_fa":"true","dc:identifier":"SCOPUS_ID:28844466437","prism:url":"https://api.elsevier.com/content/abstract/scopus_id/28844466437","dc:title":"Erratum: Reexamination of the Helfrich-Hurault effect in smectic-A liquid crystals (Physical Review E (2005) 72 (041708))","author":[{"@_fa":"true","initials":"G.","index-name":"Bevilacqua G.","surname":"Bevilacqua","authid":"57190392218","author-url":"https://api.elsevier.com/content/author/author_id/57190392218"},{"@_fa":"true","initials":"G.","index-name":"Napoli G.","surname":"Napoli","authid":"10042600200","author-url":"https://api.elsevier.com/content/author/author_id/10042600200"}],"citationType":{"@code":"er","$":"Erratum"},"sort-year":"2005","prism:publicationName":"Physical Review E - Statistical, Nonlinear, and Soft Matter Physics","prism:volume":"72","prism:issueIdentifier":"5","prism:issn":"1539-3755","pcc":"0","cc":[{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"}],"lcc":"0","rangeCount":"0","rowTotal":"0"}]}}},"citeColumnTotalXML":{"citeCountHeader":{"prevColumnHeading":"previous","columnHeading":[{"$":"2005"},{"$":"2006"},{"$":"2007"},{"$":"2008"},{"$":"2009"},{"$":"2010"},{"$":"2011"},{"$":"2012"},{"$":"2013"},{"$":"2014"},{"$":"2015"},{"$":"2016"},{"$":"2017"},{"$":"2018"},{"$":"2019"},{"$":"2020"},{"$":"2021"}],"laterColumnHeading":"later","prevColumnTotal":"0","columnTotal":[{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"}],"laterColumnTotal":"0","rangeColumnTotal":"0","grandTotal":"0"}}}}
For me it's exactly the same.
The relevant part is "cc":[{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"},{"$":"0"}]
, and that's all normal.
Could you please execute the following code and report back what's happening?
from pybliometrics.scopus import CitationOverview
co = CitationOverview(identifier=["28844466437"], start=2005)
print(co._citeInfoMatrix)
print(co._citeInfoMatrix[0]["cc"])
print([int(d['$']) for d in co._citeInfoMatrix[0]["cc"]])
print(co.cc)
Code:
#!/usr/bin/env python3
from pybliometrics.scopus import CitationOverview
co = CitationOverview(identifier=["28844466437"], start=2005)
print(co._citeInfoMatrix)
print(co._citeInfoMatrix[0]["cc"])
print([int(d['$']) for d in co._citeInfoMatrix[0]["cc"]])
print(co.cc)
Result:
$ python3 205_more_info.py
[{'@_fa': 'true', 'identifier': 'SCOPUS_ID:28844466437', 'url': 'https://api.elsevier.com/content/abstract/scopus_id/28844466437', 'title': 'Erratum: Reexamination of the Helfrich-Hurault effect in smectic-A liquid crystals (Physical Review E (2005) 72 (041708))', 'author': [{'@_fa': 'true', 'initials': 'G.', 'index-name': 'Bevilacqua G.', 'surname': 'Bevilacqua', 'authid': '57190392218', 'author-url': 'https://api.elsevier.com/content/author/author_id/57190392218'}, {'@_fa': 'true', 'initials': 'G.', 'index-name': 'Napoli G.', 'surname': 'Napoli', 'authid': '10042600200', 'author-url': 'https://api.elsevier.com/content/author/author_id/10042600200'}], 'citationType': {'@code': 'er', '$': 'Erratum'}, 'sort-year': '2005', 'publicationName': 'Physical Review E - Statistical, Nonlinear, and Soft Matter Physics', 'volume': '72', 'issueIdentifier': '5', 'issn': '1539-3755', 'pcc': '0', 'cc': [{'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}], 'lcc': '0', 'rangeCount': '0', 'rowTotal': '0'}]
[{'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[[(2005, 0), (2006, 0), (2007, 0), (2008, 0), (2009, 0), (2010, 0), (2011, 0), (2012, 0), (2013, 0), (2014, 0), (2015, 0), (2016, 0), (2017, 0), (2018, 0), (2019, 0), (2020, 0), (2021, 0)]]
Okay, so it all works.
No it doesn't work.
The exception is still there.
Code:
#!/usr/bin/env python3
# https://github.com/pybliometrics-dev/pybliometrics/issues/205
from pybliometrics.scopus import CitationOverview
res = CitationOverview(identifier=["85098969104"], start=2020)
print(res.cc)
Result:
$ python3 205_1.py
Traceback (most recent call last):
File "/run/media/raffaele/55ab61c4-83cf-4d9f-a5cd-7fcfdc14b4fb/Dropbox (DIG)/Paper_covidworking_and_productivity_RM/5_download_citations/pybliometrics_bugs/205_1.py", line 7, in <module>
print(res.cc)
File "/home/raffaele/.local/lib/python3.9/site-packages/pybliometrics/scopus/abstract_citation.py", line 44, in cc
cites = [int(d['$']) for d in doc['cc']]
File "/home/raffaele/.local/lib/python3.9/site-packages/pybliometrics/scopus/abstract_citation.py", line 44, in <listcomp>
cites = [int(d['$']) for d in doc['cc']]
TypeError: string indices must be integers
Thanks
Could you then please print the output of above snippet with the Scopus ID that's not working?
The snippet is:
from pybliometrics.scopus import CitationOverview
res = CitationOverview(identifier=["85098969104"], start=2020)
print(res.cc)
The Scopus ID that is not working is 85098969104
What should I do with it? I need the output of the snipped that I posted with your ID that's not working.
from pybliometrics.scopus import CitationOverview
co = CitationOverview(identifier=["85098969104"], start=2005, refresh=True)
print(co._citeInfoMatrix)
print(co._citeInfoMatrix[0]["cc"])
print([int(d['$']) for d in co._citeInfoMatrix[0]["cc"]])
print(co.cc)
Here is the output of the snippet:
$ python3 205_more_info_2.py
[{'@_fa': 'true', 'identifier': 'SCOPUS_ID:85098969104', 'url': 'https://api.elsevier.com/content/abstract/scopus_id/85098969104', 'title': 'Correction to: The delamination of a growing elastic sheet with adhesion (Meccanica, (2017), 52, 14, (3481-3487), 10.1007/s11012-017-0618-0)', 'author': [{'@_fa': 'true', 'initials': 'G.', 'index-name': 'Napoli G.', 'surname': 'Napoli', 'authid': '10042600200', 'author-url': 'https://api.elsevier.com/content/author/author_id/10042600200'}, {'@_fa': 'true', 'initials': 'S.', 'index-name': 'Turzi S.', 'surname': 'Turzi', 'authid': '14631459100', 'author-url': 'https://api.elsevier.com/content/author/author_id/14631459100'}], 'citationType': {'@code': 'er', '$': 'Erratum'}, 'sort-year': '2021', 'publicationName': 'Meccanica', 'volume': '56', 'issueIdentifier': '1', 'startingPage': '253', 'issn': '0025-6455', 'pcc': '0', 'cc': [{'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}], 'lcc': '0', 'rangeCount': '0', 'rowTotal': '0'}]
[{'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}, {'$': '0'}]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[[(2005, 0), (2006, 0), (2007, 0), (2008, 0), (2009, 0), (2010, 0), (2011, 0), (2012, 0), (2013, 0), (2014, 0), (2015, 0), (2016, 0), (2017, 0), (2018, 0), (2019, 0), (2020, 0), (2021, 0)]]
Now it works BTW:
$ python3 205_1.py
[[(2020, 0), (2021, 0)]]
Seems another bug fixed by refresh=True
.
But why it happens that the download is damaged in the first place?
I met all kinds of weird errors. Sometimes the download interrupts, rarely the API returns faulty code, etc. Annoying, but rare, and can be fixed by re-downloading.
The following MWE:
will throw the following exception: