graphnet-team / graphnet

A Deep learning library for neutrino telescopes
https://graphnet-team.github.io/graphnet/
Apache License 2.0
85 stars 86 forks source link

Implemented MinkowskiKNNEdges #630

Closed AMHermansen closed 6 months ago

AMHermansen commented 8 months ago

Implementation of MinkowskiKNNEdges.

Current implementation is written purely in python, and has a larger than necessary memory overhead. Current memory consumption is O(n^2), where n is the largest number of pulses in a single event. A custom implementation in cpp/cuda would reduce this to O(n*k) where k is number of edges.

The code is currently working, but I'll try to construct some sort of unit test to verify it.

RasmusOrsoe commented 7 months ago

@AMHermansen I think this could be a nice addition to the repo!

It would be fantastic if you could add a unit test that assert that the code works, and some kind of quantification of computation speed for future reference (See attached plot as an example) 277937748-06b7bf8d-96de-4540-8954-f45e2f7f1adf

The overhead of measuring the execution speed as shown above is quite small, and I'd be happy to share the code via slack that I used to make the plot.

AMHermansen commented 7 months ago

@AMHermansen I think this could be a nice addition to the repo!

It would be fantastic if you could add a unit test that assert that the code works, and some kind of quantification of computation speed for future reference (See attached plot as an example) 277937748-06b7bf8d-96de-4540-8954-f45e2f7f1adf

The overhead of measuring the execution speed as shown above is quite small, and I'd be happy to share the code via slack that I used to make the plot.

If my memory of algorithmic complexity serves me right, the time complexity should be O(B*n^2), where B is number of batches and n is largest number of pulses in a batch, but I'll make a concrete benchmark that shows speeds on the hardware that I'm using. (I didn't feel it was any slower than the current KNNEdges, but this was done during CPU-preprocessing, which I have never experienced being a bottleneck during training.)

Do you have a suggestion for values of k and n, for which I should try it on (I believe only n should matter, but I can probably test it both), and should I also benchmark on cpu vs gpu?

I'm assuming that randomly generated data suffices for the computational benchmark?

RasmusOrsoe commented 7 months ago

Do you have a suggestion for values of k and n, for which I should try it on (I believe only n should matter, but I can probably test it both), and should I also benchmark on cpu vs gpu?

I think it will be sufficient to probe k = [4, 8, 16] and n in a similar range to the xaxis of the plot above. So ~10 to ~10.000. The idea here is not to probe the phase space completely, but to give some rough idea on how the the execution time scales with number of pulses and k.

I'm assuming that randomly generated data suffices for the computational benchmark?

Yes, that is perfectly fine.

AMHermansen commented 7 months ago

Hello @RasmusOrsoe I've now done the benchmark, and the normal KNN is definitely faster. I've also ran a training script with this MKNN and there the entire training-pipeline didn't seem to be much slower. (My usual experience with cpu-load during runs is that I/O and forward/backward pass on the GPU are much more dominating factors for overall training time).

For a run where I was cropping graphs to have at-most 256 nodes I didn't notice any particular slowdown from this definition, so the increased computation time seems to be acceptable at least up to 256-nodes.

I would imagine that a custom cpp/cu implementation would reach similar speeds as the knn-benchmark, should we continue with this implementation or do you find the slowdown to be too great? edge_definition_benchmark

RasmusOrsoe commented 7 months ago

@AMHermansen Thank you for this very thorough test! I think the execution times here are well within the margins of acceptance - great job! I don't think it's necessary to optimize this further.

AMHermansen commented 7 months ago

@RasmusOrsoe Could you have a look at this PR. I'm still not entirely sure that the row/coll order is correct. I think the easiest way for you to verify this is in the unit test, where I include expected, which contains how I expect the edge_index to look like. (i.e. verify that I haven't ordered rows/cols in the wrong way)