sinaatalay / rendercv

The engine of the RenderCV App
http://docs.rendercv.com
MIT License
1.75k stars 116 forks source link

Requests instead of urllib, commit random pick of headers #1

Closed LabAsim closed 10 months ago

LabAsim commented 10 months ago

The existance of the doi was being checked with urllib, which was constantly returned a 40X status, I have replaced urllib with Requests library, which i think uses urllib3 under the hood.

I have also commited a new folder with a file containing a list with random headers to be included in the request to the DOI site.

sinaatalay commented 10 months ago

Hello, thank you for your contribution. I think we can find an easier way to tackle this problem. I don't want to add a new requirement to the package, and we should only modify the contents of the related function. May you have a solution of that kind?

LabAsim commented 10 months ago

Requests is a package from the Python Foundation. I don't know if the same thing can be achieved with urllib3 (standard library), i have never used it directly. I think requests is an easy way to handle connection to the DOI site. If you prefer not to use requests, i will try to revise my commit.

sinaatalay commented 10 months ago

Okay, then, I would like to keep this issue on hold. I would be happy to see your pull request again if you have anything new in your mind.

sinaatalay commented 10 months ago

Sorry, I just saw your edit. I am opening the pull request again.

LabAsim commented 10 months ago

I have replaced requests with urllib3. It was fairly easy All the contributions use only standard libraries.

Edit: Urllib3 is another package from github. I am sorry for the inconvience. I have just discovered that (i always thought that it was in standard library..)

sinaatalay commented 10 months ago

There should be a way to solve the problem just by editing the contents of the function. I don't plan to create new Python modules inside the package unless it is unavoidable. The solution should be very short and minimal, too.

LabAsim commented 10 months ago

I kept it as minimal as i could. I have added a list of 2 headers because the Request object needs to have headers in order not to have a 40X response from the DOI site.

sinaatalay commented 10 months ago

Hello, I need the DOI that caused you problems to test this and the original code.

LabAsim commented 10 months ago

One example is the following DOI: /10.3390/brainsci13030448

sinaatalay commented 10 months ago

This is a very good find. However, I want to avoid including headers or using any random function inside the code. RenderCV should be deterministic.

Therefore, my solution proposition is this: Let's only raise exceptions if the exception code received by urllib is 404. I don't see any other edge cases that would cause a bug with this solution. 404 definitely means the DOI is wrong, and any code other than 404 definitely means DOI exists.

So we should add an if statement: if the exception code is 404, raise an error; otherwise, don't do anything. Something like this:

try:
    urllib.request.urlopen(doi_url)
except urllib.request.HTTPError as e:
    if e.code == 404:
        raise ValueError(f"{doi} cannot be found in the DOI System.")
LabAsim commented 10 months ago

Besides the 404 status that you mentioned, i have received a 403 Forbidden status, whenever i had sent a request without a header. What do you propose for this status?

sinaatalay commented 10 months ago

Exactly, since it's not a 404, it should accept the DOI. 403 is fine.

LabAsim commented 10 months ago

Although, i would prefer to be sure that the doi exists by getting a 200 status, this could also work.

sinaatalay commented 10 months ago

I don't think there is any DOI that returns 403 and doesn't exist. This will be the minimalist working solution. If we encounter other DOIs with problems that we can't think of now, we can think about your solution.

For now, I will be happy to merge your pull request if you could modify the last changes we talked about.

LabAsim commented 10 months ago

Ok, i will modify the function and re commit.