jimmyday12 / fitzroy_data

2 stars 1 forks source link

Afltables fix #5

Closed peteowen1 closed 2 months ago

peteowen1 commented 2 months ago

when combined with the fitzRoy merge https://github.com/jimmyday12/fitzRoy/pull/223 This fixes the following issues:

https://github.com/jimmyday12/fitzRoy/issues/78 https://github.com/jimmyday12/fitzRoy/issues/82 https://github.com/jimmyday12/fitzRoy/issues/136 https://github.com/jimmyday12/fitzRoy/issues/155 https://github.com/jimmyday12/fitzRoy/issues/171 (this is same request as 155) https://github.com/jimmyday12/fitzRoy/issues/170 https://github.com/jimmyday12/fitzRoy/issues/182 https://github.com/jimmyday12/fitzRoy/issues/214

jimmyday12 commented 2 months ago

Having trouble merging this one. I suspect we may need to get https://github.com/jimmyday12/fitzRoy/pull/223 merged first, then potentially run the re-scrape on the main branch. Github isn't liking the conflicts between the RDA and RDS files

peteowen1 commented 2 months ago

Yeah was curious how it would handle merging the datasets. But yeah I like the idea of just getting https://github.com/jimmyday12/fitzRoy/pull/223 through and then you can run the re-scrape on main!

jimmyday12 commented 2 months ago

I've just pushed an update with 'rescrape' set to TRUE and it's running as a Github Action so will see how it goes

jimmyday12 commented 2 months ago

I've just split the weekly script into two and setup two different github actions to run through them.

peteowen1 commented 2 months ago

Sweet - Can have a look into the footywire stuff to see if it can be sped up if you want? First thing I noticed straight away is that it takes 3 seconds to scrape a match and 2 seconds of that is Sys.sleep(2), so could be down to an hour without that. I assume that's for polite scraping and/or to avoid rate limits? In which case fair - but maybe don't need to do a full rescrape every time if that's the case?

jimmyday12 commented 2 months ago

That'd be awesome. I'll be super honest, I don't remember why that was put in. This would have been one of the first functions I wrote for fitzRoy while still a pretty raw R developer, so I suspect it can be heavily optimised in many ways.

I suspect it was me being pretty conservative early days and wanting to be polite. I think we could likely remove it and just add a user_agent reference to github.com/jimmyday12

jimmyday12 commented 2 months ago

Just also noting here - I've made a commit with a new folder called data-raw-2. I'm planning to just use the new scripts to write only the data files we need in fitzRoy to this folder and then update fitzRoy functions to use those files rather than the existing ones. I haven't updated fitzRoy yet, so you'll see the new scripts write to both locations.

No changes needed but just pointing out the plan of attack here