pybliometrics-dev / pybliometrics

Python-based API-Wrapper to access Scopus
https://pybliometrics.readthedocs.io/en/stable/
Other
401 stars 124 forks source link

CitationOverview's cc property unexpected Error. #301

Closed kosh-jp closed 8 months ago

kosh-jp commented 9 months ago

pybliometrics version:

3.5.2

Code to reproduce the bug:

code

from pybliometrics.scopus import CitationOverview

def fetch_citing_count(article: Article):
    co = CitationOverview(
        [article.doi],
        article.pub_date_year,
        id_type="doi",
        APIKey="*****",
        InstToken="*****",
    )
    return co.cc

article.doi = "10.1016/j.suc.2023.06.004"
article.pub_date_year = "2023"
fetch_citing_count(article)

result

Traceback (most recent call last):
  .......
  File "/my_dir/env/lib/python3.10/site-packages/pybliometrics/scopus/abstract_citation.py", line 44, in cc
    cites = [int(d['$']) for d in doc['cc']]
  File "/my_dir/env/lib/python3.10/site-packages/pybliometrics/scopus/abstract_citation.py", line 44, in <listcomp>
    cites = [int(d['$']) for d in doc['cc']]
TypeError: string indices must be integers

Expected behavior:

code

article.doi = "10.1016/j.media.2012.02.005"
article.pub_date_year = "2012"
print(fetch_citing_count(article))

result

[[(2012, 0),
  (2013, 0),
  (2014, 0),
  (2015, 0),
  (2016, 0),
  (2017, 0),
  (2018, 0),
  (2019, 0),
  (2020, 0),
  (2021, 0),
  (2022, 0),
  (2023, 0)]]

My request:

I would like to change code like below.

/pybliometrics/scopus/abstract_citation.py

            try:
                cites = [int(d["$"]) for d in doc["cc"]]
            except AttributeError:  # No citations
                cites = [0] * len(_years)
            except TypeError:
                cites = [0] * len(_years)

reason

This exception is occur by _citaInfoMatrix.cc have '0', because '0' is return from scopus API. (expected is List[Dict])

'_citeInfoMatrix': [
        {
            '@_fa': 'true',
            'author': [
                {
                    '@_fa': 'true',
                    'authid': None,
                    'author-url': 'https : //api.elsevier.com/content/author/author_id/',
                    'index-name': None,
                    'initials': None,
                    'surname': None
                }
            ],
            'cc': '0',
....

Could you please chage it? Or, could I send pull request?

Michael-E-Rose commented 9 months ago

I really wonder how you get that result. The CitationOverview() class needs the Scopus IDs (not even EIDs work). So try this please:

identifier = ["84861986826"]
co = CitationOverview(identifier, start=2012)
print(co.cc)
# [[(2012, 2), (2013, 11), (2014, 15), (2015, 15), (2016, 25), (2017, 41), (2018, 56), (2019, 53), (2020, 46), (2021, 63), # (2022, 56), (2023, 51)]]

If you only have DOIs, use the AbstractRetrieval() first. Will also deliver the publication year, for the start parameter.

Interestingly, when I use your DOI, I get a pybliometrics.scopus.exception.Scopus404Error: The resource specified cannot be found.

kosh-jp commented 9 months ago

Thanks for your reply!!

Behavior of CitationOverview class.

First, we can use DOIs on Scopus CitationOverview API.

And, I think I can use DOIs params on pybliometrics code. Because, you implement that id_type parameter on the code. I can select doi by id_type param. https://github.com/pybliometrics-dev/pybliometrics/blob/master/pybliometrics/scopus/abstract_citation.py#L192-L212

    def __init__(
        ...
        id_type: str = "scopus_id",
        ....
    ) -> None:
        """Interaction witht the Citation Overview API.
        ...
        :param id_type: The type of the IDs provided in `identifier`.  Must be
                        one of "scopus_id", "doi", "pii", "pubmed_id".
        ...

https://github.com/pybliometrics-dev/pybliometrics/blob/master/pybliometrics/scopus/abstract_citation.py#L271-L276

        # Get file content
        ...
        kwds.update({id_type: identifier})
        ...
        Retrieval.__init__(
            self, stem, api="CitationOverview", date=date, citation=citation, **kwds
        )

Scopus404Error

Interestingly, when I use your DOI, I get a pybliometrics.scopus.exception.Scopus404Error: The resource specified cannot be found.

Thanks for checking it out. I guess that new articles in less well-known journals may experience periods of unregistration...

Code change request

If my code change request is not accepted, no problem, don't worry about it. In that case, I will include the code to handle this specific error in my code.

Your wonderful library has helped me a lot and I am very grateful!!

regards. Thank you.

Michael-E-Rose commented 9 months ago

Hi again! I'm always happy for code changes by the pybliometics community, but I still fail to see the very problem your proposal wants to solve.

With

from pybliometrics.scopus import CitationOverview

identifier = ["10.1016/j.suc.2023.06.004"]
co = CitationOverview(identifier, start="2023")

I get the pybliometrics.scopus.exception.Scopus404Error: The resource specified cannot be found. I already mentioned above. So I don't see how handling a TypeError, in addition to an AttributeError in CitationOverview().cc deals with this.

And sorry for the confusion with the DOIs, you're of course right that CitationOverview() accepts DOIs.

kosh-jp commented 9 months ago

Thanks for discuss!

I misunderstood your Scopus404Error comment, sorry.

When I use doi as dentifier, I specify doi as id_type.

from pybliometrics.scopus import CitationOverview

identifier = ["10.1016/j.suc.2023.06.004"]
co = CitationOverview(identifier, start="2023", id_type="doi")
Michael-E-Rose commented 9 months ago

Alright, now I finally get the error as well:

>>> from pybliometrics.scopus import CitationOverview
>>> 
>>> identifier = ["10.1016/j.suc.2023.06.004"]
>>> co = CitationOverview(identifier, start="2023", id_type="doi")
>>> co.cc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/panther/.local/lib/python3.10/site-packages/pybliometrics/scopus/abstract_citation.py", line 44, in cc
    cites = [int(d['$']) for d in doc['cc']]
  File "/home/panther/.local/lib/python3.10/site-packages/pybliometrics/scopus/abstract_citation.py", line 44, in <listcomp>
    cites = [int(d['$']) for d in doc['cc']]
TypeError: string indices must be integers

Give me a little time to think about your proposed solution.

kosh-jp commented 8 months ago

Thank you very much for code change!