hangnoh / flybaseR

2 stars 2 forks source link

Flybase reject request for html_session("http://flybase.org") in 2023? #5

Open rmd13 opened 1 year ago

rmd13 commented 1 year ago

I have run the code with no problem in 2022. Today I re-run first time since last 3 month, and it get error:

session <- html_session("http://flybase.org") Warning message: In request_GET(session, url) : Forbidden (HTTP 403).

But other website works well:

session <- html_session("http://www.google.com")

My explorer could visit http://flybase.org, but rvest calling was forbidden.

Is this problem repeatable by you?

Thanks

vficarrotta commented 1 year ago

I'm receiving a 403 forbidden error too.

vficarrotta commented 1 year ago

I tried increasing the time between retrievals but I am still 403. I'm invested in this being a working package so if I am able to help, I will.

hangnoh commented 1 year ago

Nope, this seems to be something to do with FlyBase because rvest works fine. I am tied up this week, but look into this next week :) Thank you.

On Thu, Feb 16, 2023 at 11:17 AM vficarrotta @.***> wrote:

I tried increasing the time between retrievals but I am still 403. I'm invested in this being a working package so if I am able to help, I will.

— Reply to this email directly, view it on GitHub https://github.com/hangnoh/flybaseR/issues/5#issuecomment-1433344861, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEEAWR7G3QCS4KCAK3GOQCDWXZHH5ANCNFSM6AAAAAAU2QPPUY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

vficarrotta commented 1 year ago

No worries then. I'll post anything I get from interactions with flybase.

hangnoh commented 1 year ago

Hi, all

I found that we have the error because RCurl gives no user agent information to FlyBase by default (= NULL). I made them believe that we are using chrome or safari, which resolved the 403 error.

I also found that rvest has been changed a lot since I used it last time, so I updated the function accordingly.

It seems to be working on my hand, but I will check out further for the next a couple of days. FlyBase has changed the html forms, so I should make additional adjustments.

rmd13 commented 1 year ago

Hi hangnoh,

Thanks a lot. Do we need to reinstall latest version of rvest?I found that html_session() was changed to session().

hangnoh @.***> 于 2023年2月21日周二 14:01写道:

Hi, all

I found that we have the error because RCurl gives no user agent information to FlyBase by default (= NULL). I made them believe that we are using chrome or safari, which resolved the 403 error.

I also found that rvest has been changed a lot since I used it last time, so I updated the function accordingly.

It seems to be working on my hand, but I will check out further for the next a couple of days. FlyBase has changed the html forms, so I should make additional adjustments.

— Reply to this email directly, view it on GitHub https://github.com/hangnoh/flybaseR/issues/5#issuecomment-1437902987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFKC4FCSMCRZKBW5QGXNTBLWYRKZ7ANCNFSM6AAAAAAU2QPPUY . You are receiving this because you authored the thread.Message ID: @.***>

hangnoh commented 1 year ago

Yes, I used the version that comes with Tidyverse! Some functions changed their names.

vficarrotta commented 1 year ago

thanks a bunch!

rmd13 commented 1 year ago

I try to update but failed because my R is 3.5.1, too old. The current problem is that html_session still works with a user_agent, but read_html function still get 403 even if use a user_agent:

  library(rvest)
  library(httr)
  uastring <- "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"
  session <- html_session(url,user_agent(uastring))

  aStock_Http = "http://flybase.org/reports/FBst0007568"
# not work
  aStock_Html <- read_html(aStock_Http) 
# still not work
  aStock_Html <- read_html(aStock_Http,user_agent(uastring))

Does read_html work in new rvest package?

rmd13 commented 1 year ago

Oh, I solved it by using html_session instead:

session_stock <- html_session(aStock_Http,user_agent(uastring))
aStock_Html <- html(session_stock)