scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.34k stars 295 forks source link

Fetch journal metrics #359

Closed arunkannawadi closed 2 years ago

arunkannawadi commented 2 years ago

We should to implement APIs to fetch journal metrics: https://scholar.google.com/citations?view_op=top_venues&hl=en

giswqs commented 2 years ago

This would a great feature to add. Very interested in this.

arunkannawadi commented 2 years ago

Good to know you're interested. The hardest part so far has been the designing of an API, given that there is an overall leaderboard for each language and in English, there are multiple categories and subcategories.

My current idea is to have an API to list out all the possible categories and then another API that will accept one of those to return a list of journals and metrics. Any thoughts on what would be intuitive/useful are welcome.

giswqs commented 2 years ago

Here is a list of 160 journals under the 8 categories. I used to copy and paste those into the speadsheet. It would be nice if there is an API to retrieve the list of journals under each category and subcategory.

gregmoille commented 2 years ago

Not sure if that can be useful but in my workflow I use the impact_factor package:

from scholarly import scholarly
from impact_factor import ImpactFactor

author = scholarly.search_author_id('6fBdsmYAAAAJ')
pub = scholarly.fill(author, sections=['publications']
pub = scholarly.fill(pub[0])
journal = pub['bib']['journal']

IF = ImpactFactor()
IF.search(journal)
arunkannawadi commented 2 years ago

Thanks for the snippet. This isn't quite what we want though, since as a Google Scholar scraper, we want to get the information from (and only from) Google Scholar. This package seems to be downloading it from elsewhere into a database and running the query, which is bit more complex than scraping the Google Scholar page on demand.

Moreover, Google Scholar has entries other than impact factor (h5-index, h5-median) and journal ranking, which is what I am after here.

arunkannawadi commented 2 years ago

@giswqs This should now be available in v1.6 through get_journals/save_journals_csv method in scholarly for each category and subcategory. You can see the (sub-)categories themselves using get_journal_categories method. Give it a try and please open an issue if you have find bugs with it.

giswqs commented 2 years ago

Fantastic! Will try it out. Thank you