scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.35k stars 296 forks source link

Publication no longer fillable #273

Closed eitanf closed 3 years ago

eitanf commented 3 years ago

The method search_pubs used to return a Publication object, which you could then call the fill() method on to get more information. Nowadays it appears to return a straight dictionary, which cannot be filled. This breaks not only the previous API, but also functionality. Specifically, since you can't fill a publication, you can't parse its bibtex record, which has an ID field that is not currently obtainable in other ways. The ID field can be used to find a specific result to the search query (if you know the ID from before).

I suggest adding a "fill" parameter to search_pubs with a default value of False. I'm happy to create a PR if you think this feature would be incorporated.

ipeirotis commented 3 years ago

No idea what this request is asking, as the objects returned by search_pubs do have a key called filled and a source key with the value PUBLICATION_SEARCH_SNIPPET.

Here is an example of the code and the result:

query = '\"digital twin\" \"cloud\"'
search_query = scholarly.search_pubs(query)
r = next(search_query)
scholarly.pprint(r)

returns

{'author_id': ['c0Fvx10AAAAJ', 'VcOjgngAAAAJ'],
 'bib': {'abstract': 'Cyber-physical system (CPS) is a new trend in the '
                     'Internet-of-Things related research works, where '
                     'physical systems act as the sensors to collect '
                     'real-world information and communicate them to the '
                     'computation modules (ie cyber layer), which further '
                     'analyze and',
         'author': ['KM Alam', 'A El Saddik'],
         'pub_year': '2017',
         'title': 'C2PS: A digital twin architecture reference model for the '
                  'cloud-based cyber-physical systems',
         'venue': 'IEEE access'},
 'citedby_url': '/scholar?cites=14550300503829375948&as_sdt=5,33&sciodt=0,33&hl=en',
 'eprint_url': 'https://ieeexplore.ieee.org/iel7/6287639/6514899/07829368.pdf',
 'filled': False,
 'gsrank': 1,
 'num_citations': 300,
 'pub_url': 'https://ieeexplore.ieee.org/abstract/document/7829368/',
 'source': 'PUBLICATION_SEARCH_SNIPPET',
 'url_add_sclib': '/citations?hl=en&xsrf=&continue=/scholar%3Fq%3D%2522digital%2Btwin%2522%2B%2522cloud%2522%26hl%3Den%26as_sdt%3D0,33&citilm=1&json=&update_op=library_add&info=zOvvqTcN7ckJ&ei=vNxQYLGjKPSL6rQPmrmeiAY',
 'url_related_articles': '/scholar?q=related:zOvvqTcN7ckJ:scholar.google.com/&scioq=%22digital+twin%22+%22cloud%22&hl=en&as_sdt=0,33',
 'url_scholarbib': '/scholar?q=info:zOvvqTcN7ckJ:scholar.google.com/&output=cite&scirp=0&hl=en'}

then

result = scholarly.fill(r)
scholarly.pprint(result)

returns

{'author_id': ['c0Fvx10AAAAJ', 'VcOjgngAAAAJ'],
 'bib': {'abstract': 'Cyber-physical system (CPS) is a new trend in the '
                     'Internet-of-Things related research works, where '
                     'physical systems act as the sensors to collect '
                     'real-world information and communicate them to the '
                     'computation modules (ie cyber layer), which further '
                     'analyze and',
         'author': 'Alam, Kazi Masudul and El Saddik, Abdulmotaleb',
         'bib_id': 'alam2017c2ps',
         'journal': 'IEEE access',
         'pages': '2050--2062',
         'pub_type': 'article',
         'pub_year': '2017',
         'publisher': 'IEEE',
         'title': 'C2PS: A digital twin architecture reference model for the '
                  'cloud-based cyber-physical systems',
         'venue': 'IEEE access',
         'volume': '5'},
 'citedby_url': '/scholar?cites=14550300503829375948&as_sdt=5,33&sciodt=0,33&hl=en',
 'eprint_url': 'https://ieeexplore.ieee.org/iel7/6287639/6514899/07829368.pdf',
 'filled': True,
 'gsrank': 1,
 'num_citations': 300,
 'pub_url': 'https://ieeexplore.ieee.org/abstract/document/7829368/',
 'source': 'PUBLICATION_SEARCH_SNIPPET',
 'url_add_sclib': '/citations?hl=en&xsrf=&continue=/scholar%3Fq%3D%2522digital%2Btwin%2522%2B%2522cloud%2522%26hl%3Den%26as_sdt%3D0,33&citilm=1&json=&update_op=library_add&info=zOvvqTcN7ckJ&ei=vNxQYLGjKPSL6rQPmrmeiAY',
 'url_related_articles': '/scholar?q=related:zOvvqTcN7ckJ:scholar.google.com/&scioq=%22digital+twin%22+%22cloud%22&hl=en&as_sdt=0,33',
 'url_scholarbib': '/scholar?q=info:zOvvqTcN7ckJ:scholar.google.com/&output=cite&scirp=0&hl=en'}
eitanf commented 3 years ago

You're right. The old syntax required: r.fill() That's no longer supported, but I hadn't realized that the new syntax: scholar.fill(r) works instead. Thanks you.