robjhyndman / cricketdata

International cricket data for men and women, Tests, ODIs and T20s
http://pkg.robjhyndman.com/cricketdata/
80 stars 21 forks source link

Solution for cleaning raw T20 cricsheet data #23

Closed Dazzalytics closed 2 years ago

Dazzalytics commented 2 years ago

Hello, The t20 ball-by-ball data from cricsheet is a little raw. Right now, the ball by ball data doesn't contain, among other things,

I have attached a file containing a (cleaning) function that addresses the above-mentioned points, and a bit more. Note that this is only for t20 data. However, it can easily be updated for ODIs and tests.

I am not sure whether to add this cleaning function to the fetch_cricsheet function or to keep it separate. What do you folks think? Cleaning Cricsheet T20 Data.zip

robjhyndman commented 2 years ago

There are a few things to consider here:

If we can sort out these issues, I'd be happy to add this as an optional cleaning step in the fetch_cricsheet() function. Perhaps clean=FALSE by default so existing code doesn't break.

Dazzalytics commented 2 years ago

Thank you for the feedback.

Dazzalytics commented 2 years ago

Please find the updated code file attached, addressing the issues discussed above. Please note that it is only for cleaning t20 data from the cricsheet (all t20 competitions).

Cleaning Cricsheet T20 Data.zip

robjhyndman commented 2 years ago

Thanks. I've made a few changes and added it. I decided not to add the clean argument as this clearly fixes some errors with the data. Let me know if there are any problems with the updated function.

Also, please add yourself as a contributor to the package. I didn't know what name to include.