Closed RuABraun closed 8 months ago
I remember it used to be split
in the past, but there was some issue with it, I just can't remember what it was. I think the best approach might be to split it first and run a job array with SLURM processing each chunk separately, and then re-combine.
Yeah fair. Nicer to run a job array I'll do that.
One issue I noticed with split()
is that if you have more jobs than files it will crash, whereas LazySlicer didn't have that issue.
I'm calling
compute_and_store_features()
with a slurm executor and by default it runs very slowly because the jobs take a long time (many minutes) to get submitted.If I change this line to
The job submittal is instant.
I would expect the existing implementation to be slower since it's iterating across the entire original cutset
num_job
times (rather than just once), but not orders of magnitudes slower. Wondering if there's something I'm missing, and if we could update the code (I'm willing make a PR) to make it faster (open to another approach).