scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.4k stars 302 forks source link

Bug when using `publication_limit` and `sortby = "year"` #255

Closed ipeirotis closed 3 years ago

ipeirotis commented 3 years ago

I tried searching my profile, sorted by year of publication, and setting the limit to 5:

author = scholarly.search_author_id(id = 'PA9La6oAAAAJ', filled = True, sortby = "year", publication_limit = 5)
scholarly.pprint(author)

The answer that I got back contained 6 publications, and the last one seemed to be coming from year 2010:

{'affiliation': 'New York Univesity',
 'citedby': 21086,
 'citedby5y': 10866,
 'cites_per_year': .....
 'coauthors': .....
 'email_domain': '@stern.nyu.edu',
 'filled': True,
 'hindex': 49,
 'hindex5y': 34,
 'i10index': 95,
 'i10index5y': 65,
 'interests': ['Crowdsourcing',
               'Data Quality',
               'Text Analytics using Economics'],
 'name': 'Panos Ipeirotis',
 'publications': [{'author_pub_id': 'PA9La6oAAAAJ:djcsc3XHdKAC',
                   'bib': {'pub_year': '2020',
                           'title': 'Methods, systems, and media for '
                                    'identifying errors in predictive models '
                                    'using annotators'},
                   'filled': False,
                   'num_citations': 0,
                   'source': 'AUTHOR_PUBLICATION_ENTRY'},
                  {'author_pub_id': 'PA9La6oAAAAJ:rwEhk56xNqMC',
                   'bib': {'pub_year': '2020',
                           'title': 'Gender and Race Preferences in Hiring in '
                                    'the Age of Diversity Goals: Evidence from '
                                    'Silicon Valley Tech Firms'},
                   'filled': False,
                   'num_citations': 1,
                   'source': 'AUTHOR_PUBLICATION_ENTRY'},
                  {'author_pub_id': 'PA9La6oAAAAJ:ddB7do2jUx8C',
                   'bib': {'pub_year': '2020',
                           'title': 'Creativity on Paid Crowdsourcing '
                                    'Platforms'},
                   'filled': False,
                   'num_citations': 6,
                   'source': 'AUTHOR_PUBLICATION_ENTRY'},
                  {'author_pub_id': 'PA9La6oAAAAJ:TuM7UPshZo8C',
                   'bib': {'pub_year': '2020',
                           'title': 'Demand-Aware Career Path Recommendations: '
                                    'A Reinforcement Learning Approach'},
                   'filled': False,
                   'num_citations': 1,
                   'source': 'AUTHOR_PUBLICATION_ENTRY'},
                  {'author_pub_id': 'PA9La6oAAAAJ:eJjLl3UG7CkC',
                   'bib': {'pub_year': '2019',
                           'title': 'Statistical considerations for '
                                    'crowdsourced perceptual ratings of human '
                                    'speech productions'},
                   'filled': False,
                   'num_citations': 3,
                   'source': 'AUTHOR_PUBLICATION_ENTRY'},
                  {'author_pub_id': 'PA9La6oAAAAJ:hFOr9nPyWt4C',
                   'bib': {'pub_year': '2010',
                           'title': 'A report on the human computation '
                                    'workshop (HComp 2009)'},
                   'filled': False,
                   'num_citations': 7,
                   'source': 'AUTHOR_PUBLICATION_ENTRY'}],
 'scholar_id': 'PA9La6oAAAAJ',
 'source': 'AUTHOR_PROFILE_PAGE',
 'url_picture': 'https://scholar.googleusercontent.com/citations?view_op=view_photo&user=PA9La6oAAAAJ&citpid=3'}
scholarly-issue-tracking commented 3 years ago

Solution: added improved breakpoint when filling publications, and also fixed url.

Code:

author = scholarly.search_author_id(id = 'PA9La6oAAAAJ', filled = True, sortby = "year", publication_limit=5)
scholarly.pprint(author)

Output:

{'affiliation': 'New York Univesity',
 'citedby': 21269,
 'citedby5y': 10993,
 'cites_per_year': {2003: 97,
                    2004: 145,
                    2005: 130,
                    2006: 187,
                    2007: 260,
                    2008: 303,
                    2009: 476,
                    2010: 630,
                    2011: 960,
                    2012: 1213,
                    2013: 1579,
                    2014: 1833,
                    2015: 2074,
                    2016: 2211,
                    2017: 2203,
                    2018: 2151,
                    2019: 2109,
                    2020: 2096,
                    2021: 216},
 'coauthors': [{'affiliation': 'Heinz Riehl Chair Professor of Business, NYU '
                               'Stern',
                'filled': False,
                'name': 'Anindya Ghose',
                'scholar_id': 'oQHsB5kAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Professor of Computer Science, Columbia '
                               'University',
                'filled': False,
                'name': 'Luis Gravano',
                'scholar_id': 'Ff6era8AAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Professor, New York University',
                'filled': False,
                'name': 'Foster Provost',
                'scholar_id': '-Km63D4AAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Carnegie Mellon University',
                'filled': False,
                'name': 'Beibei Li',
                'scholar_id': 'XcRBC7gAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Hong Kong University of Science and Technology',
                'filled': False,
                'name': 'Jing Wang',
                'scholar_id': '1F0bQi0AAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Professor of Computer Science, University of '
                               'Toronto',
                'filled': False,
                'name': 'Nick Koudas',
                'scholar_id': 'f0Wc8tkAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'AT&T Labs-Research',
                'filled': False,
                'name': 'Divesh Srivastava',
                'scholar_id': 'kGKlHp0AAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Adjunct Faculty, University of Michigan',
                'filled': False,
                'name': 'Jesse Chandler',
                'scholar_id': '8JkiWl0AAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'NYU Stern, Harold Price Professor of '
                               'Entrepreneurship and Technology',
                'filled': False,
                'name': 'Arun Sundararajan',
                'scholar_id': 'M0OB5XQAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Associate Professor, Erasmus University '
                               'Rotterdam',
                'filled': False,
                'name': 'Gabriele Paolacci',
                'scholar_id': 'Vq2ccE4AAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Professor of Computer Science, Stanford '
                               'University',
                'filled': False,
                'name': 'Mehran Sahami',
                'scholar_id': 'ZasL8IoAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': '',
                'filled': False,
                'name': 'H. V. Jagadish',
                'scholar_id': 'SKVnHakAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Rutgers Univ',
                'filled': False,
                'name': 'S Muthukrishnan',
                'scholar_id': 'MvyO9jAAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Professor of Computer Science, Texas Tech '
                               'University',
                'filled': False,
                'name': 'Victor S. Sheng',
                'scholar_id': '0epc43IAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Hellenic Open University, GREECE',
                'filled': False,
                'name': 'Vassilios Verykios',
                'scholar_id': 'Md2HV1cAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Executive Director, Qatar Computing Research '
                               'Institute',
                'filled': False,
                'name': 'Ahmed Elmagarmid',
                'scholar_id': 'Cb9BDJkAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Assistant Professor, Computer Science, NYU Abu '
                               'Dhabi',
                'filled': False,
                'name': 'Djellel Difallah',
                'scholar_id': 'K5d-OHEAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Assistant Professor of Information Systems, '
                               'Boston College',
                'filled': False,
                'name': 'Marios Kokkodis',
                'scholar_id': 'NvknD8EAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Principal Research Scientist / Research '
                               'Director, Google Health',
                'filled': False,
                'name': 'Evgeniy Gabrilovich',
                'scholar_id': 'DKCx8hcAAAAJ',
                'source': 'CO_AUTHORS_LIST'},
               {'affiliation': 'Professor of Computer Science, Emory '
                               'University',
                'filled': False,
                'name': 'Eugene Agichtein',
                'scholar_id': '3BX3vWcAAAAJ',
                'source': 'CO_AUTHORS_LIST'}],
 'email_domain': '@stern.nyu.edu',
 'filled': True,
 'hindex': 50,
 'hindex5y': 35,
 'i10index': 98,
 'i10index5y': 66,
 'interests': ['Crowdsourcing',
               'Data Quality',
               'Text Analytics using Economics'],
 'name': 'Panos Ipeirotis',
 'publications': [{'author_pub_id': 'PA9La6oAAAAJ:djcsc3XHdKAC',
                   'bib': {'pub_year': '2020',
                           'title': 'Methods, systems, and media for '
                                    'identifying errors in predictive models '
                                    'using annotators'},
                   'filled': False,
                   'num_citations': 0,
                   'source': 'AUTHOR_PUBLICATION_ENTRY'},
                  {'author_pub_id': 'PA9La6oAAAAJ:rwEhk56xNqMC',
                   'bib': {'pub_year': '2020',
                           'title': 'Gender and Race Preferences in Hiring in '
                                    'the Age of Diversity Goals: Evidence from '
                                    'Silicon Valley Tech Firms'},
                   'filled': False,
                   'num_citations': 1,
                   'source': 'AUTHOR_PUBLICATION_ENTRY'},
                  {'author_pub_id': 'PA9La6oAAAAJ:ddB7do2jUx8C',
                   'bib': {'pub_year': '2020',
                           'title': 'Creativity on Paid Crowdsourcing '
                                    'Platforms'},
                   'filled': False,
                   'num_citations': 6,
                   'source': 'AUTHOR_PUBLICATION_ENTRY'},
                  {'author_pub_id': 'PA9La6oAAAAJ:TuM7UPshZo8C',
                   'bib': {'pub_year': '2020',
                           'title': 'Demand-Aware Career Path Recommendations: '
                                    'A Reinforcement Learning Approach'},
                   'filled': False,
                   'num_citations': 1,
                   'source': 'AUTHOR_PUBLICATION_ENTRY'},
                  {'author_pub_id': 'PA9La6oAAAAJ:eJjLl3UG7CkC',
                   'bib': {'pub_year': '2019',
                           'title': 'Statistical considerations for '
                                    'crowdsourced perceptual ratings of human '
                                    'speech productions'},
                   'filled': False,
                   'num_citations': 3,
                   'source': 'AUTHOR_PUBLICATION_ENTRY'}],
 'scholar_id': 'PA9La6oAAAAJ',
 'source': 'AUTHOR_PROFILE_PAGE',
 'url_picture': 'https://scholar.googleusercontent.com/citations?view_op=view_photo&user=PA9La6oAAAAJ&citpid=3'}