Too many input files for MultiQC

nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.

https://nf-co.re/rnaseq

MIT License

885 stars 702 forks source link

Too many input files for MultiQC #100

Open orzechoj opened 6 years ago

orzechoj commented 6 years ago

I ran the RNA-seq pipeline on 360 samples, and the slurm submission of multiQC failed with Pathname of a file, directory or other parameter too long

ERROR ~ Error executing process > 'multiqc'
Caused by:
 Failed to submit process to grid scheduler for execution
Command executed:
 sbatch .command.run
Command exit status:
 1
Command output:
sbatch: error: Batch job submission failed: Pathname of a file, directory or other parameter too long

The files .command.stub and .command.sh look normal, but .command.run is 11Mb, with many commands for lnetc. So it might be something related to this bug: https://bugs.schedmd.com/show_bug.cgi?id=2198

ewels commented 6 years ago

@pditommaso - have you come across problems like this before? I guess that this is because the MultiQC process is softlinking in a lot of files which makes .command.run massive so that slurm rejects it.

pditommaso commented 6 years ago

Ouch, 11Mb of input files! You can mitigate this problem using an directory as output instead files. I mean, instead of having

   output:
    file "*_fastqc.{zip,html}" into fastqc_results

let multiqc to save the files into a directory e.g. reports, then

   output:
    file "reports" into fastqc_results

ewels commented 6 years ago

Yes, maybe we should profile how many files each channel going into MultiQC has. I suspect that there are quite a few that aren't needed. For example - MultiQC only needs the zip file here, not the html. So could make new MultiQC-specific channels that have just these files to cut down on the number.

apeltzer commented 5 years ago

I'm wondering whether @olgabot had issues with this when doing her large-scale nf-core/rnaseq experiments on AWS - any ideas?

ojziff commented 4 years ago

i ran the RNAseq pipeline on 576 fastq files and the slurm submission has also failed on the multiqc process with the same error: sbatch: error: Batch job submission failed: Pathname of a file, directory or other parameter too long There is no .command.out in work/ Is there any update on a work around for this? Thank you

jfy133 commented 4 years ago

FYI: A user just encountered the same error in nf-core/eager when trying to run a 1000 sample job. If I understand the solution proposed above, in this case I don't think the directory output would necessarily work as most of the log files in this case are standalone from separate processes (rather than lots of logs from a single process).

apeltzer commented 3 years ago

Had that some days ago and opened https://github.com/nextflow-io/nextflow/issues/2118 for some points

ggabernet commented 1 year ago

Just for the record, we've also had this issue now with nf-core/airrflow

ssnn-airr commented 1 year ago

Re the nf-core/airrflow issue @ggabernet just mentioned. I can confirm the .command.run file size exceeds the SLURM max_script_size reported by scontrol show config. There are many rm and ln lines in the section nxf_stage().

apeltzer commented 1 year ago

The issue at Nextflow is still open, the small scale mitigation attempts did also not help us permanently either: Maybe also comment here too to make sure this gets addressed soon 👉🏻 https://github.com/nextflow-io/nextflow/issues/2852

m3hdad commented 1 year ago

same issue on nf-core/proteinfold softlinking mmcif_files about 210342 lines of softlinking?

apeltzer commented 1 year ago

Should be better when using https://github.com/nextflow-io/nextflow/issues/2852