ropensci / stats19

R package for working with open road traffic casualty data from Great Britain
https://docs.ropensci.org/stats19
GNU General Public License v3.0
62 stars 19 forks source link

Python version of py-stats19 #250

Open UCLWilson opened 3 months ago

UCLWilson commented 3 months ago

Hi Robin @Robinlovelace ,

I hope this message finds you well.

We’ve likely met at GISRUK, GIScience, and other data science conferences. I am Xiaowei, a final-year PhD student at UCL, supervised by James Haworth. My research focuses on using graph deep learning for traffic crash prediction, specifically for the UK.

I am grateful for your work on the R package for STATS19. To further support deep learning and machine learning analyses for road safety in the UK, my colleague Jinshuai , from Data Science Insitute, LSE and I have developed a Python version of this package, named py-stats19. This project is under the supervision of Dr. James Haworth and Prof. Tao Cheng (Director of SpaceTimeLab).

Our Python package extends the R version by providing access to data from 1979 onward, with features for easily referencing specific years. It also incorporates temporal information and geometry to support spatiotemporal analysis. We are currently in the early stages of development and aim to include LLM and visualization tools to make the package more accessible and interpretable for public users, policymakers, and researchers. We are targeting a completion date by the end of this year and have already purchased a domain name for the project.

As this is our first open-resource package, we would greatly appreciate any insights or support you might offer.

Thank you for your time and consideration.

Best regards, Xiaowei

Robinlovelace commented 3 months ago

Hi @UCLWilson thanks for your interest. We have an issue tracking the development of a Python version: #230. Great to think about features to support, the R version does provide access to the 1979-present dataset. Look forward to giving your package a go, but cannot see any code here: https://github.com/Mayazure/py-stats19

Robinlovelace commented 3 months ago

I see this currently, do you have a different link for the source code? image

UCLWilson commented 3 months ago

I see this currently, do you have a different link for the source code? image

Sorry, robin. We just made it as public. https://github.com/Mayazure/py-stats19

Please feel free to let us know how we could help in further.

UCLWilson commented 3 months ago

We are fixing some data pulling bugs now, will updated a new one later today, sorry.

layik commented 3 months ago

Great! I am away but when I have time will try to contribute as my Py is a little sharper than R. As suggested in #230 it would be great to have some common code in the two packages.

Will watch your work.

Robinlovelace commented 3 months ago

Just took a look, great to see more open code for working with road collision data, and the fact it's a Python package should make it accessible to many people. Great also that it allows user to set a default directory, like the R package.

One question: have you thought about using duckdb or polars in addition or as an alternative to pandas?

UCLWilson commented 3 months ago

Hi @layik and @Robinlovelace , Thank you so much for your encouraging words. As noted in the README for the Python version, your R package provided a solid foundation for our development of py-stats19. We look forward to collaborating in the future to contribute to open code and enhance road safety in the UK. Please feel free to share this information, as promoting the Python version could help others who are interested in analysis and modelling.

Regarding data processing, we initially chose to use pandas for compatibility. However, we are considering switching to Polars for improved performance after I complete my PhD thesis, which is expected around September or October. If you believe this could benefit further research, we would be happy to contribute to the STATS19 project.

If you’d like to discuss this further, please feel free to reach out to me via email.

We greatly appreciate your inspiring work and ongoing contributions to open code, which have been a great motivation for our project.