scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.37k stars 298 forks source link

Journal name in `scholarly.fill()` method #423

Closed eurunuela closed 2 years ago

eurunuela commented 2 years ago

What feature would you like to request?

I would like to get the journal name as part of the bib key when using the scholarly.fill() method.

Describe the solution you'd like

The journal name should appear next to the pub_year and title keys in json["publications"]["bib"].

Do you plan on contributing? Your response below will clarify if this is something that the maintainers can expect you to work on or not.

I do not plan on contributing at the moment.

Additional context I am using scholarly to generate a json file that feeds my personal website with my recent publications. I love how I can get the title and year of the publications, but I cannot use the journal even though this information is very relevant.

arunkannawadi commented 2 years ago

Thanks for raising this issue @eurunuela . I am trying to think if there was a valid reason for not including this feature so far. We will try to have this in the next release, but in the mean time, you could obtain the journal by calling the scholarly.fill method on the publication. For e.g.,

author = scholar.search_author_id(<your GS id>)
scholarly.fill(author)
for pub in author["publications"]:
    scholarly.fill(pub)

# Fetch the journal name
print(author["publications"][0]["bib"]["journal"])
arunkannawadi commented 2 years ago

I think I remembered why we don't have that feature. On my profile page, the journal names appear as

Astronomy & Astrophysics 640, L14, Monthly Notices of the Royal Astronomical Society 477 (4), 4285-4307, Publications of the Astronomical Society of the Pacific 128 (968), 104001, arXiv preprint arXiv:2010.16416 etc. It is not clear what the general rule is to extract the journal names alone leaving the numbers corresponding to the edition number, page number, arxiv id etc. I thought may be we could stop when we see the numbers, but in your case that would fail for 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), 1726-1729. So I'm now inclined to think we may not be incorporating this feature due to this ambiguity.

The extra fill calls I mentioned above take some extra time in sending the requests, but those calls are perfectly in accordance with Google Scholar's scraping policies.

arunkannawadi commented 2 years ago

Would it help if we had that entire string with the page numbers included? I can't imagine why that would be helpful though, since you can't aggregate by journal names anyway.

arunkannawadi commented 2 years ago

Resolution: Create a new field called citation within bib dictionary that contains the string. The journal field will be filled only upon calling fill on the publication object to keep the journal name clean from the volume, page numbers etc.

eurunuela commented 2 years ago

This is all very helpful @arunkannawadi , thank you!

I will use the for loop you suggested, but I like the idea of having the citation key too. Thanks!

arunkannawadi commented 2 years ago

@eurunuela This feature is now on v1.7.0 available via pip. It'll be conda installable as well in about a day.

eurunuela commented 2 years ago

That's great, thank you!