graphnet-team / graphnet

A Deep learning library for neutrino telescopes
https://graphnet-team.github.io/graphnet/
Apache License 2.0
90 stars 92 forks source link

Bugfix to #685 and #683 #686

Closed RasmusOrsoe closed 5 months ago

RasmusOrsoe commented 6 months ago

Closes #683, closes #685 .

683 was solved by switching a single line from polars df.explode(columns) which utilizes multiprocessing to a pure numpy-based solution. This numpy-based solution gave a significant speed up to ParquetDataset.__getitem__. In #677 I found the ParquetDataset.__getitem__ to be ~1.8 times slower than it's SQLite counterpart on a 1 million event sample. Following this PR, ParquetDataset.__getitem__ is ~1.2 times slower than its SQLite counterpart on the same sample.