I have fixed some issues related to the GPU parameter in the Topaz command line. First, it was passing all the gpuList and this is not valid in the Topaz command line. Moreover, if we use a number of threads/mpi it will be passing the exact same command to all of them. My fix was to use the %(GPU) that is replaced properly by the executor with the assigned GPU.
Anyway, in the readSetOfCoordinates there is an issue parsing the coordinates. Imagine that you have two GPUs (0 and 1) and you launch the protocol with threads=3 gpus="0 1" and the batch size 16. GPU 0 will process 1-16 and GPU1 17-32. But when reading coordinates, if all 32 micrographs are discovered as new...it is expecting a file "topaz_coordinates1-32" which of course, does not exist.
I think this issue is important since prevents one to fully use existing computational resources (e.g a machine with 2 or more GPUs) and then one lost the benefits of using Scipion with its built-in parallelization mechanism.
Hi, @pconesa @DaniDelHoyo
I have fixed some issues related to the GPU parameter in the Topaz command line. First, it was passing all the gpuList and this is not valid in the Topaz command line. Moreover, if we use a number of threads/mpi it will be passing the exact same command to all of them. My fix was to use the %(GPU) that is replaced properly by the executor with the assigned GPU.
Anyway, in the readSetOfCoordinates there is an issue parsing the coordinates. Imagine that you have two GPUs (0 and 1) and you launch the protocol with threads=3 gpus="0 1" and the batch size 16. GPU 0 will process 1-16 and GPU1 17-32. But when reading coordinates, if all 32 micrographs are discovered as new...it is expecting a file "topaz_coordinates1-32" which of course, does not exist.
I think this issue is important since prevents one to fully use existing computational resources (e.g a machine with 2 or more GPUs) and then one lost the benefits of using Scipion with its built-in parallelization mechanism.
Here is the parsing coordinates line: https://github.com/scipion-em/scipion-em-topaz/blob/4e9ec7af0c590424fa73de22d77ad986f714a67a/topaz/protocols/protocol_topaz_picking.py#L120