scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.36k stars 298 forks source link

Improve search_author_id response to be consistent with search_author #231

Closed ipeirotis closed 3 years ago

ipeirotis commented 3 years ago

Check the issue reported by @sebastian-lapuschkin-sideprojects in https://github.com/scholarly-python-package/scholarly/issues/214#issuecomment-749931259_

It seems that the search_author_id does not return the same fields as search_author despite parsing the same page. We should ensure that the two functions rely on the same code and return results that are consistent with each other.

ipeirotis commented 3 years ago

There are still some differences between the two.

author = next(scholarly.search_author('Sebastian Lapuschkin'))
fields_name = set(author.keys())
scholarly.pprint(author)

print("=====")

author_id = author['scholar_id']
author = scholarly.search_author_id(author_id)
fields_id = set(author.keys())
scholarly.pprint(author)

print("\n\n=======\n\nDifference in fields:", fields_name ^ fields_id)

will return

Difference in fields: {'citedby'}
scholarly-issue-tracking commented 3 years ago

Fixed in v1.1.0

The following code

author = next(scholarly.search_author('Sebastian Lapuschkin'))
fields_name = set(author.keys())
scholarly.pprint(author)

print("=====")

author_id = author['scholar_id']
author = scholarly.search_author_id(author_id)
fields_id = set(author.keys())
scholarly.pprint(author)

print("\n\n=======\n\nDifference in fields:", fields_name ^ fields_id)

Has the following output:

{'affiliation': 'Head of Explainable AI Group, Fraunhofer Heinrich Hertz '
                'Institute',
 'citedby': 3975,
 'email_domain': '@hhi.fraunhofer.de',
 'filled': False,
 'interests': ['Interpretability',
               'Explainable AI',
               'Machine Learning',
               'Artificial Intelligence',
               'Deep Learning'],
 'name': 'Sebastian Lapuschkin (né Bach)',
 'scholar_id': 'wpLQuroAAAAJ',
 'source': 'SEARCH_AUTHOR_SNIPPETS',
 'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=wpLQuroAAAAJ'}
=====
{'affiliation': 'Head of Explainable AI Group, Fraunhofer Heinrich Hertz '
                'Institute',
 'citedby': 3975,
 'email_domain': '@hhi.fraunhofer.de',
 'filled': False,
 'interests': ['Interpretability',
               'Explainable AI',
               'Machine Learning',
               'Artificial Intelligence',
               'Deep Learning'],
 'name': 'Sebastian Lapuschkin (né Bach)',
 'scholar_id': 'wpLQuroAAAAJ',
 'source': 'AUTHOR_PROFILE_PAGE',
 'url_picture': 'https://scholar.googleusercontent.com/citations?view_op=view_photo&user=wpLQuroAAAAJ&citpid=10'}

=======

Difference in fields: set()