slerch / scoringRules2Py_W2W

Python interface for the scoringRules R package for use within the W2W project
0 stars 2 forks source link

Benchmarking results #1

Closed shoyer closed 6 years ago

shoyer commented 6 years ago

Out of curiosity, how did properscoring compare to scoringtools?

slerch commented 6 years ago

I only did a few non-systematic tests (examples_speed-comparisons.py) some time ago. When computing the CRPS of an ensemble in a for loop, using scoringtools (lines 42--49) results in negligible 1-5 % (or so) improvements over properscoring (lines 51--58). Using vectorization of the score computation in R (thus minimizing Pyhton-R-interfacing, lines 35--40) leads to a 25-30 % lower computation time when using scoringtools. However, there might be more efficient ways to vectorize score computation with properscoring in Python that I am not aware of.

Is the properscoring package being actively developed? We are currently considering extending the scoringRules2Py package towards a full Python implementation which would provide functionality complementing the properscoring package for evaluating probabilistic forecasts in Python.

shoyer commented 6 years ago

Everything in properscoring should work with vectors, so you should be able to write simply psr.crps_ensemble(y[0,:], dat[0,:,:], axis=1) to do the calculation all at once for a significant speed-up. If you have Numba installed, properscoring should be significantly faster still, especially for large ensemble sizes (>10 samples).

I moved on from Climate Corp and as far as I know the project is not being actively developed. But if you're interested in working on it to add more functionality, I'm sure we can bring it back to life. Some of the people I worked with on the project at Climate Corp are still on that team.

We also have a pretty extensive test suite that might be worth borrowing even if you write new code from scratch.

slerch commented 6 years ago

Thanks! Using the vectorized version + Numba indeed speeds up computations considerably, and is much faster compared to the rpy2-based implementation in scoringtools. I have updated the repository and uploaded the plots here.

Our implementation is part of a software development effort in a trans-regional research project. Due to issues of the rpy2-implementation regarding ease of installation and computation times, we will likely switch to a Python or C++ implementation of the forecast evaluation module that is currently based on the R package scoringRules. I will be in touch in case we end up with functionalities that might be interesting for integration into properscoring.