scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.36k stars 298 forks source link

Fetching url_picture property when searching for author by id #214

Closed sebastian-lapuschkin-sideprojects closed 3 years ago

sebastian-lapuschkin-sideprojects commented 3 years ago

When searching for an author by name, the url_picture property is set.

>>> list(scholarly.scholarly.search_author('Sebastian Lapuschkin))[0]

{'affiliation': 'Postdoctoral Research Associate, Fraunhofer Heinrich Hertz '
                'Institute',
 'citedby': 3363,
 'email': '@hhi.fraunhofer.de',
 'filled': False,
 'id': 'wpLQuroAAAAJ',
 'interests': ['Interpretability',
               'Explainable AI',
               'Machine Learning',
               'Artificial Intelligence',
               'Deep Learning'],
 'name': 'Sebastian Lapuschkin (né Bach)',
 'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=wpLQuroAAAAJ'}

However, if the same author info is requested via their author ID, the url_picture property is not set. This might be great however, since name based resolution sometimes is not unambiguous.

>>> scholarly.scholarly.search_author_id('wpLQuroAAAAJ')

{'affiliation': 'Postdoctoral Research Associate, Fraunhofer Heinrich Hertz '
                'Institute',
 'filled': False,
 'id': 'wpLQuroAAAAJ',
 'interests': ['Interpretability',
               'Explainable AI',
               'Machine Learning',
               'Artificial Intelligence',
               'Deep Learning'],
 'name': 'Sebastian Lapuschkin (né Bach)'}

Is there any way to obtain the url_property field when searching authors by id

sebastian-lapuschkin-sideprojects commented 3 years ago

Issue probably is related to #132

ipeirotis commented 3 years ago

Issue fixed


from scholarly import scholarly

author = next(scholarly.search_author('Sebastian Lapuschkin'))

scholarly.pprint(author)

author = scholarly.search_author_id('wpLQuroAAAAJ')

scholarly.pprint(author)

returns

{'affiliation': 'Postdoctoral Research Associate, Fraunhofer Heinrich Hertz '
                'Institute',
 'citedby': 3568,
 'email_domain': '@hhi.fraunhofer.de',
 'filled': False,
 'interests': ['Interpretability',
               'Explainable AI',
               'Machine Learning',
               'Artificial Intelligence',
               'Deep Learning'],
 'name': 'Sebastian Lapuschkin (né Bach)',
 'scholar_id': 'wpLQuroAAAAJ',
 'source': 'SEARCH_AUTHOR_SNIPPETS',
 'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=wpLQuroAAAAJ'}

{'affiliation': 'Postdoctoral Research Associate, Fraunhofer Heinrich Hertz '
                'Institute',
 'filled': False,
 'interests': ['Interpretability',
               'Explainable AI',
               'Machine Learning',
               'Artificial Intelligence',
               'Deep Learning'],
 'name': 'Sebastian Lapuschkin (né Bach)',
 'scholar_id': 'wpLQuroAAAAJ',
 'source': 'AUTHOR_PROFILE_PAGE',
 'url_picture': 'https://scholar.googleusercontent.com/citations?view_op=view_photo&user=wpLQuroAAAAJ&citpid=10'}
sebastian-lapuschkin-sideprojects commented 3 years ago

Thank you for the fix, this definitely is nice! However, there is still deviating behavior between search_author and search_author_id: If the sought after author did not set a profile picture, the former returns a dict containing the field url_picture pointing towards the default author pic, while the latter does not contain the field. I can of course manually create the field and infer its content, but the behavior is inconsistent. A bunch of other fields are missing as well, e.g. citedby, which is a weird thing to get when searching by ambiguous name instead of specific id.

Is there any chance to further align both outputs in this respect? Code to reproduce below:

from scholarly import scholarly

author = next(scholarly.search_author('Djordje Slijepcevic'))

scholarly.pprint(author)

author = scholarly.search_author_id('faWNz2YAAAAJ')

scholarly.pprint(author)

Output:

{'affiliation': 'Researcher, Institute of Creative\\Media/Technologies, St. '
                'Pölten University of Applied',
 'citedby': 66,
 'email_domain': '@fhstp.ac.at',
 'filled': False,
 'interests': ['Machine Learning',
               'Deep Learning',
               'Computer Vision',
               'Gait Analysis'],
 'name': 'Djordje Slijepcevic',
 'scholar_id': 'faWNz2YAAAAJ',
 'source': 'SEARCH_AUTHOR_SNIPPETS',
 'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=faWNz2YAAAAJ'}

{'affiliation': 'Researcher, Institute of Creative\\Media/Technologies, St. '
                'Pölten University of Applied',
 'filled': False,
 'interests': ['Machine Learning',
               'Deep Learning',
               'Computer Vision',
               'Gait Analysis'],
 'name': 'Djordje Slijepcevic',
 'scholar_id': 'faWNz2YAAAAJ',
 'source': 'AUTHOR_PROFILE_PAGE'}