Validator parent class (in IVT) accepts a list of data_paths rather than a single path at a time. Files are then collected and processed in parallel in various plugin subclasses. Testing indicates that this cuts processing time of fastq files in half (at least for a <10GB upload with fastq files). EDIT: or maybe way less!
CODEX and Publication plugins accept data_path lists but are not currently parallelized, but can be updated to be. Testing for these plugins is already quite fast so I focused on large-file plugins (fasq, gz, ome.tiff, tiff).
Parallelizing made redundancy checking in fastq_validator_logic unreliable. I moved this logic outside of the Engine call that processes files in parallel.
Tested:
Tests in repo work (locally).
Tests include TIFFs, which take the most time. Previous testing time: 22.38s. New testing time: 14.11s.
Validator parent class (in IVT) accepts a list of data_paths rather than a single path at a time. Files are then collected and processed in parallel in various plugin subclasses. Testing indicates that this cuts processing time of fastq files in half (at least for a <10GB upload with fastq files). EDIT: or maybe way less!
CODEX and Publication plugins accept data_path lists but are not currently parallelized, but can be updated to be. Testing for these plugins is already quite fast so I focused on large-file plugins (fasq, gz, ome.tiff, tiff).
Parallelizing made redundancy checking in fastq_validator_logic unreliable. I moved this logic outside of the Engine call that processes files in parallel.
Tested:
Note