jimmyday12 / fitzroy_data

2 stars 1 forks source link

Timeouts in fetch_player_stats_afltables #9

Open jimmyday12 opened 1 month ago

jimmyday12 commented 1 month ago

Probably more for fitzRoy but just making note of it here that the afltables weekly github action is regularly failing if a timeout occurs.

Need to update that function to better handle either retries or silently returning errors and moving to the next URL when this occurs.

I've done this in other contexts using req_retry from httr2. The function that we need to update uses rvest instead so will need to figure out something similar (or replace rvest with httr2).

https://httr2.r-lib.org/reference/req_retry.html

peteowen1 commented 1 month ago

Yeahhh it's not loving the full rescrapes i don't think hey. Looks like it's getting request limited which is fair. For AFLtables maybe we can just do 4 pushes each with 32 years to get a full rescrape - and then we should be fine to go back to just scraping the past year or two. Trying to make sure I'm not creating too much work for you as well!

Will send across 5 pull requests - 1 for each of the 32 year rescrapes - and a final one (weekly-rescrape) that you can merge last just to get back to the normal schedule. Can just merge these one by one to make sure everything works fine?

Haven't digged into the footywire data too much - but there aren't many issues raised with it so don't think we need the full rescrape? Could just change the weekly script to do last season or two? (think that's what it was before splitting into two scripts?) Have changed this to just rescrape this year and last year in the pull requests

jimmyday12 commented 1 month ago

Thanks mate, appreciate it. Have approved the first one. I'm assuming we'll need the rescrape to run and finish before then doing the next one so I'll wait for that before approving the 2nd etc

jimmyday12 commented 1 month ago

In terms of footywire - I suspect we can just reduce it right down as well

peteowen1 commented 1 month ago

sweet - yep wait 'til the end of each run before merging next. Then I'm happy to check 'em once they've all run to make sure all the issues have been fixed.

peteowen1 commented 1 month ago

lovely - looks like first one ran all good! Quick data check all seems fine as well

peteowen1 commented 1 month ago

Actually gonna be really annoying and ask if you can merge fitzRoy first. Realised there's an issue with jumper.no. being integer not character. Should fix that first in fitzRoy and then rescrape. Have tested this locally and changing to as.character() fixes things. Will pull request the 1897 rescrape again