relaton / relaton-data-bipm

Relaton bibliographic data for BIPM
2 stars 1 forks source link

BIPM Metrologia references cannot be fetched due to "no access to online site" #23

Closed anermina closed 1 year ago

anermina commented 1 year ago

Several Metrologia references cannot be fetched due to an error "no access to online site".

Examples:

[relaton-bipm] ("BIPM Metrologia 50 4 385") fetching...
Bibliography: Could not retrieve BIPM Metrologia 50 4 385: no access to online site
Bibliography: Could not retrieve BIPM Metrologia 50 4 395: no access to online site
Bibliography: Could not retrieve BIPM Metrologia 47 5 R15: no access to online site
Bibliography: Could not retrieve BIPM Metrologia 43 2 S78: no access to online site
Bibliography: Could not retrieve BIPM Metrologia 53 6 1354: no access to online site
Bibliography: Could not retrieve BIPM Metrologia 50 1 20: no access to online site
Bibliography: Could not retrieve BIPM Metrologia 46 1 62: no access to online site
Bibliography: Could not retrieve BIPM Metrologia 43 2 S22: no access to online site
Bibliography: Could not retrieve BIPM Metrologia 42 2 89: no access to online site
Bibliography: Could not retrieve BIPM Metrologia 46 5 534: no access to online site
Bibliography: Could not retrieve BIPM Metrologia 43 1A 02003: no access to online site
Bibliography: Could not retrieve BIPM Metrologia 41 1 41: no access to online site

Output of the relaton fetch command:

bundle exec relaton fetch "BIPM Metrologia 47 5 R15"
[relaton-bipm] ("BIPM Metrologia 47 5 R15") fetching...
403 => Net::HTTPForbidden for https://iopscience.iop.org/article/10.1088/0026-1394/47/5/R01 -- unhandled response

Note: I can reach the site https://iopscience.iop.org/article/10.1088/0026-1394/47/5/R01 regularly using my browser.

ronaldtse commented 1 year ago

@anermina can you please update relaton-bipm (e.g. gem install relaton-bipm or bundle update)?

IOP has recently implemented anti-DDoS measures which blocks machine access.

The gem now implements detection for IOP's blocking of bulk access, which will now ask you to clear the online captcha.

ronaldtse commented 1 year ago

@andrew2net can you also document the Anti-DDOS detection in the README? Thanks.

anermina commented 1 year ago

I actually get different output when machine access is blocked due to anti-DDoS measures:

[relaton-bipm] This source employs anti-DDoS measures that unfortunately affects automated requests.
[relaton-bipm] Please visit this link in your browser to resolve the CAPTCHA, then retry: https://iopscience.iop.org/article/10.1088/0026-1394/6/3/001

But for these references, I am still getting this kind of output after performing gem install relaton-bipm and bundle update:

[relaton-bipm] ("BIPM Metrologia 43 2 S78") fetching...
403 => Net::HTTPForbidden for https://iopscience.iop.org/article/10.1088/0026-1394/43/2/S16 -- unhandled response
ronaldtse commented 1 year ago

@anermina you're right, I get the same problem!

[relaton-bipm] ("BIPM Metrologia 43 2 S78") fetching...
403 => Net::HTTPForbidden for https://iopscience.iop.org/article/10.1088/0026-1394/43/2/S16 -- unhandled response

I am able to open the link in my browser as well.

andrew2net commented 1 year ago

@ronald in a browser the HTTP request also returns 403 error however the page is displayed correctly. I've made the 403 allowed HTTP code in the relaton-bipm. Maybe we should let BIPM know that some pages have the error.

image
andrew2net commented 1 year ago

Fixed in version 1.13.6

ronaldtse commented 1 year ago

Thanks @andrew2net, that is clearly broken on the IOP side. I will let BIPM know.

ronaldtse commented 1 year ago

I've reported to BIPM.

This is a bug report for IOP’s publication of Metrologia citation information. When accessing some Metrologia articles via the browser, the IOP site occasionally returns the HTTP code “403”, which means “Forbidden”.

For example, this link, which goes to Metrologia Vol 47 No 5: https://iopscience.iop.org/article/10.1088/0026-1394/47/5/012

Gives me this response (refresh a few times and it will show up), as you can see the status code is “403". A “403” code means that the server has rejected the request and a properly operating server would return a “Access forbidden" page. Somehow, the IOP site still returns content, but the response has technically failed.

This issue does not happen 100% of the time but quite frequently. I suspect this is after IOP added a “DDoS” protection service in front of their servers, which prevents any sort of machine access (which we have to do to render citations…).

In any case, this issue is not related to our machine-accesses, this already happens with browser access.

Screen Shot 2022-09-21 at 12 04 08 PM