Introduces new optional multiprocessing parameters to several methods:
num_enc_workers - Change number of processes to generate encodings (Addresses #156)
num_sim_workers/num_dist_workers - Change number of processes to compute similarity/distances (Addresses #95, #113)
HOW
APIs impacted
For both CNN as well as Hashing functions, the following user-facing api calls get new parameters-
encode_images
find_duplicates
find_duplicates_to_remove
Choice of default values
num_enc_workers: For CNN methods, this is set to 0 by default (0=disabled). Furthermore, parallelization of CNN encoding generation is only supported on linux platform. This is because pytorch requires the call to Dataloader to be wrapped within an if ___name__ == '__main__' construct to enable multiprocessing on non-linux platforms (due to the difference between fork() vs spawn() call used for multiprocessing on different systems). Such a construct does not fit well with the current code structure.
For Hashing methods, this parameter is set to the cpu count of the system. The default values preserve backward compatibility.
num_dist_workers/num_sim_workers: Set to cpu count of the system. The default values preserve backward compatibility.
WHAT/WHY
Introduces new optional multiprocessing parameters to several methods:
num_enc_workers
- Change number of processes to generate encodings (Addresses #156)num_sim_workers
/num_dist_workers
- Change number of processes to compute similarity/distances (Addresses #95, #113)HOW
APIs impacted
For both CNN as well as Hashing functions, the following user-facing api calls get new parameters-
encode_images
find_duplicates
find_duplicates_to_remove
Choice of default values
num_enc_workers
: For CNN methods, this is set to 0 by default (0=disabled). Furthermore, parallelization of CNN encoding generation is only supported on linux platform. This is because pytorch requires the call to Dataloader to be wrapped within anif ___name__ == '__main__'
construct to enable multiprocessing on non-linux platforms (due to the difference betweenfork()
vsspawn()
call used for multiprocessing on different systems). Such a construct does not fit well with the current code structure. For Hashing methods, this parameter is set to the cpu count of the system. The default values preserve backward compatibility.num_dist_workers
/num_sim_workers
: Set to cpu count of the system. The default values preserve backward compatibility.