Closed AMHermansen closed 6 months ago
@AMHermansen I think this could be a nice addition to the repo!
It would be fantastic if you could add a unit test that assert that the code works, and some kind of quantification of computation speed for future reference (See attached plot as an example)
The overhead of measuring the execution speed as shown above is quite small, and I'd be happy to share the code via slack that I used to make the plot.
@AMHermansen I think this could be a nice addition to the repo!
It would be fantastic if you could add a unit test that assert that the code works, and some kind of quantification of computation speed for future reference (See attached plot as an example)
The overhead of measuring the execution speed as shown above is quite small, and I'd be happy to share the code via slack that I used to make the plot.
If my memory of algorithmic complexity serves me right, the time complexity should be O(B*n^2), where B is number of batches and n is largest number of pulses in a batch, but I'll make a concrete benchmark that shows speeds on the hardware that I'm using. (I didn't feel it was any slower than the current KNNEdges, but this was done during CPU-preprocessing, which I have never experienced being a bottleneck during training.)
Do you have a suggestion for values of k and n, for which I should try it on (I believe only n should matter, but I can probably test it both), and should I also benchmark on cpu vs gpu?
I'm assuming that randomly generated data suffices for the computational benchmark?
Do you have a suggestion for values of k and n, for which I should try it on (I believe only n should matter, but I can probably test it both), and should I also benchmark on cpu vs gpu?
I think it will be sufficient to probe k = [4, 8, 16]
and n
in a similar range to the xaxis of the plot above. So ~10 to ~10.000. The idea here is not to probe the phase space completely, but to give some rough idea on how the the execution time scales with number of pulses and k
.
I'm assuming that randomly generated data suffices for the computational benchmark?
Yes, that is perfectly fine.
Hello @RasmusOrsoe I've now done the benchmark, and the normal KNN is definitely faster. I've also ran a training script with this MKNN and there the entire training-pipeline didn't seem to be much slower. (My usual experience with cpu-load during runs is that I/O and forward/backward pass on the GPU are much more dominating factors for overall training time).
For a run where I was cropping graphs to have at-most 256 nodes I didn't notice any particular slowdown from this definition, so the increased computation time seems to be acceptable at least up to 256-nodes.
I would imagine that a custom cpp/cu implementation would reach similar speeds as the knn-benchmark, should we continue with this implementation or do you find the slowdown to be too great?
@AMHermansen Thank you for this very thorough test! I think the execution times here are well within the margins of acceptance - great job! I don't think it's necessary to optimize this further.
@RasmusOrsoe Could you have a look at this PR. I'm still not entirely sure that the row/coll order is correct. I think the easiest way for you to verify this is in the unit test, where I include expected
, which contains how I expect the edge_index
to look like. (i.e. verify that I haven't ordered rows/cols in the wrong way)
Implementation of MinkowskiKNNEdges.
Current implementation is written purely in python, and has a larger than necessary memory overhead. Current memory consumption is O(n^2), where n is the largest number of pulses in a single event. A custom implementation in cpp/cuda would reduce this to O(n*k) where k is number of edges.
The code is currently working, but I'll try to construct some sort of unit test to verify it.