Closed LabAsim closed 10 months ago
Hello, thank you for your contribution. I think we can find an easier way to tackle this problem. I don't want to add a new requirement to the package, and we should only modify the contents of the related function. May you have a solution of that kind?
Requests is a package from the Python Foundation. I don't know if the same thing can be achieved with urllib3 (standard library), i have never used it directly. I think requests is an easy way to handle connection to the DOI site. If you prefer not to use requests, i will try to revise my commit.
Okay, then, I would like to keep this issue on hold. I would be happy to see your pull request again if you have anything new in your mind.
Sorry, I just saw your edit. I am opening the pull request again.
I have replaced requests with urllib3. It was fairly easy All the contributions use only standard libraries.
Edit: Urllib3 is another package from github. I am sorry for the inconvience. I have just discovered that (i always thought that it was in standard library..)
There should be a way to solve the problem just by editing the contents of the function. I don't plan to create new Python modules inside the package unless it is unavoidable. The solution should be very short and minimal, too.
I kept it as minimal as i could. I have added a list of 2 headers because the Request object needs to have headers in order not to have a 40X response from the DOI site.
Hello, I need the DOI that caused you problems to test this and the original code.
One example is the following DOI: /10.3390/brainsci13030448
This is a very good find. However, I want to avoid including headers or using any random function inside the code. RenderCV should be deterministic.
Therefore, my solution proposition is this: Let's only raise exceptions if the exception code received by urllib
is 404
. I don't see any other edge cases that would cause a bug with this solution. 404
definitely means the DOI is wrong, and any code other than 404
definitely means DOI exists.
So we should add an if statement: if the exception code is 404, raise an error; otherwise, don't do anything. Something like this:
try:
urllib.request.urlopen(doi_url)
except urllib.request.HTTPError as e:
if e.code == 404:
raise ValueError(f"{doi} cannot be found in the DOI System.")
Besides the 404 status that you mentioned, i have received a 403 Forbidden status, whenever i had sent a request without a header. What do you propose for this status?
Exactly, since it's not a 404, it should accept the DOI. 403 is fine.
Although, i would prefer to be sure that the doi exists by getting a 200 status, this could also work.
I don't think there is any DOI that returns 403 and doesn't exist. This will be the minimalist working solution. If we encounter other DOIs with problems that we can't think of now, we can think about your solution.
For now, I will be happy to merge your pull request if you could modify the last changes we talked about.
Ok, i will modify the function and re commit.
The existance of the doi was being checked with urllib, which was constantly returned a 40X status, I have replaced urllib with Requests library, which i think uses urllib3 under the hood.
I have also commited a new folder with a file containing a list with random headers to be included in the request to the DOI site.