jimmyday12 / fitzRoy

A set of functions to easily access AFL data
https://jimmyday12.github.io/fitzRoy
Other
125 stars 27 forks source link

Afltables fix #223

Closed peteowen1 closed 2 weeks ago

peteowen1 commented 3 weeks ago

when combined with the fitzroy_data rescrape https://github.com/jimmyday12/fitzroy_data/pull/5 This fixes the following issues:

https://github.com/jimmyday12/fitzRoy/issues/78 https://github.com/jimmyday12/fitzRoy/issues/82 https://github.com/jimmyday12/fitzRoy/issues/136 https://github.com/jimmyday12/fitzRoy/issues/155 https://github.com/jimmyday12/fitzRoy/issues/171 (this is same request as 155) https://github.com/jimmyday12/fitzRoy/issues/170 https://github.com/jimmyday12/fitzRoy/issues/182 https://github.com/jimmyday12/fitzRoy/issues/214

peteowen1 commented 3 weeks ago

will be adding two columns to the afldata with this fix 'AQET' and 'HQET' for ET values what's the best approach to document for this?

peteowen1 commented 3 weeks ago

Also just noting with the script changes it only take about 15 mins to re-scrape and clean 1897 to 2024 (this could be even quicker if we change from purrr to furrr) - so could set up a monthly script or something to re-scrape everything just in-case any historical data is changed.

Also I think ideally you should store the ID script received from AFL Tables as ID_file.rds or something - and never overwrite it. Then have a separate file as ID_scraped that gets overwritten with every weekly re-scrape, and combine the two. The current system of overwriting the whole ID file with every scrape could lead to losing some info from the initial file you received if the data gets overwritten. Can raise an issue in fitzroy_data for this if needed.

jimmyday12 commented 2 weeks ago

Also just noting with the script changes it only take about 15 mins to re-scrape and clean 1897 to 2024 (this could be even quicker if we change from purrr to furrr) - so could set up a monthly script or something to re-scrape everything just in-case any historical data is changed.

Also I think ideally you should store the ID script received from AFL Tables as ID_file.rds or something - and never overwrite it. Then have a separate file as ID_scraped that gets overwritten with every weekly re-scrape, and combine the two. The current system of overwriting the whole ID file with every scrape could lead to losing some info from the initial file you received if the data gets overwritten. Can raise an issue in fitzroy_data for this if needed.

@peteowen1 thanks for all the work on this - I've not had time to look at this old code for a while so I'm sure you found a bunch of really great optimisations. In terms of your suggestions - agree on seperating the ID files and open to exploring furrr to speed things up.

Feel free to make a ticket over on fitzRoy_data and if I get some time in the next few weeks I'll look into it (but also always welcome PRs!)

jimmyday12 commented 2 weeks ago

will be adding two columns to the afldata with this fix 'AQET' and 'HQET' for ET values what's the best approach to document for this?

@peteowen1 I don't document all of the columns, so I think in terms of specifically documenting these columns we don't need anything.

What I would ask is if you could just add a line item in NEWS.md for this change that would be great.

peteowen1 commented 2 weeks ago

Will add an updated to NEWS now as well - Happy for me to increase version to 1.4.1?

Maybe you can try running the weekly script for fitzroy_data now post merge (with rescrape season from 1897) to update all the historical afltables data (if you haven't kicked this off already!)