scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.36k stars 298 forks source link

[Feature Request] Support for legal opinions #505

Open pablogranolabar opened 1 year ago

pablogranolabar commented 1 year ago

What feature would you like to request? Half of Google Scholar's functionality is related to legal opinion searching and retrieval, however there doesn't seem to be any support for court opinion retrievals with scholarly.

Describe the solution you'd like The ability to select a court/jurisdictional region, perform a keyphrase search, and then retrieve n results which are links to the actual opinions. A method to then download each opinion.

Do you plan on contributing? Your response below will clarify if this is something that the maintainers can expect you to work on or not.

Additional context I can submit a PR with some advice on where this would be best implemented. There are additional query params such as as_sdt which select the court/jurisdictions being searched that need to be passed. I have developed an undetected_chromedriver scraper for this which recursively retrieves each opinion/result, but it results in IP address bans after a few hours of bulk searching. I have not been using proxies for retrieval and limited delay loops to avoid detection so perhaps scholarly would be a better home for the code given scholarly's evasion methods.

arunkannawadi commented 1 year ago

This would be a great feature to have - one that I have thought about but never too seriously because I didn't have a use case for it.

The best place would be to have a separate file, similar to publication_parser.py that does much of the heavy lifting in parsing the legal documents. In _scholarly.py, you can then have some user facing API methods that you expect people to use.

You could start small, by not having an option to select which courts you want to limit your results to.