Closed achouhan93 closed 1 year ago
HI @achouhan93, super nice, I appreciate the initiative! It's certainly a very good idea, let's integrate this.
Before we merge, let's adapt the docstrings as well. Can you do that @achouhan93 ? Just follow the convention for the other arguments. For example, for the start_date
string it should be noted in which format it has to be (YYYY-MM-DD etc).
Meanwhile, I will check whether something similar can be achieved for chemrxiv as well
Hi @jannisborn, Thank you for considering the changes. Sure, I will update the code with the docstrings referring to the existing comments.
I have updated the docstring for the biorxiv and medrxiv functions.
Thank you, @jannisborn, for merging the request. For sure, I will give credit to paperscraper
.
During the bulk extraction of the BioArxiv and MedArxiv articles using biorxiv and medrxiv functions. The current code used to always start from the launch_date of the respective server till today's date. If the script fails in between due to the connection error, then re-executing the script results in overwriting the existing file with the records from the launch_date instead of starting to extract from the last checkpoint. Now with this pull request, a time range functionality is added in the biorxiv and medrxiv with the begin_date and end_date optional parameters. So if the user wants to extract articles for a specific time frame, then with these changes, they can extract the articles for the specific time frame. Thus, every time the function is executed if the user specifies the begin_date and end_date then it will extract for a specific time frame otherwise it will take launch_date and today_date as the begin and end date. Thus, functionality is added to provide a begin_date and end_date parameter to the biorxiv and medrxvi scripts for article extraction.