Open YoelPH opened 1 week ago
Ok, in my testing any performance issues come down to multiprocess
and the scope of the model
variable. But I don't think these are the issues in your case, because model
appears to be a globally shared variable.
Anyways, the number of patients did not seem to make any difference in my testing. Whether I used 50 or 5000 patients, the 10000 steps (would) always take around half an hour (office PC is slow).
I've linked a slightly modified version of your script that I adapted so I could turn multiprocess and the global/local thingy on and off. Try to replicate your issue with that.
https://gist.github.com/rmnldwg/ea790ac9fa469a6cd51613c94aa005a9
Oh and caching does not seem to be an issue. The cache of the data and diagnose matrix don't change at all, as long as nobody changes either the data or the modalities. So, the cache limit isn't hit.
Problem has been solved :) The issue was the numpy version. While lymph 1.2.2.dev0 still required numpy < 2.0, the "newest" version allows numpy > 2.0 With numpy >2.0 the matrixmultiplication does not seem to slow down after a specific size, thus the explosion of computation time disappears.
I noticed an interesting issue, where increasing the training dataset results in a massive increase of computation time. Example code below:
The provided code runs in roughly 15 min on my laptop. If I increase
number_of_pats
to 600 for example, the computation time explodes (2h). At some number of patients the code seems to have a problem with saving former results.