fastqc is started with the full number of threads provided in when starting the pipeline --> this might be counter-productive - it is essentially single threaded (per file), so you might want to implement a hard upper limit that equals the number of files processed in parallel
falco might be an alternative, which supports multithreading for single files and is reported to be faster while providing similar output
alternatively splitting files into multiple input files, running them in parallel with the -t option and summarizing the results with MultiQC migth provide the best speed-up
Dear Martin
Kind regards Andreas