nf-core / methylseq

Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel
https://nf-co.re/methylseq
MIT License
137 stars 142 forks source link

Latest template merge completely broke dev branch (on AWS + Fusion) #392

Closed FelixKrueger closed 2 weeks ago

FelixKrueger commented 6 months ago

Description of the bug

I am trying to launch a methylseq run using the latest dev branch where some (but not a all) samples require merging of technical replicates before launching. If I understand it correctly, the latest template changes were merged into dev earlier this month, but something seems to have gone awry:

Within seconds of launching the run, I observe the following errors:

  1. The samples are not getting merged, despite technical replicates having identical IDs (which no longer get truncated by 1 element, which is good!)
  2. Trim Galore fails straight away as the system tries to create the same symbolic link several times (details below)
  3. As one of the first processes, bismark2summary is run, and obviously fails... Screenshot 2024-03-27 at 11 15 21

Obviously, the ln -s command attempts to use the very same filename 6 times over, which doesn't work. But something also screwed up the entire workflow logic, i.e. not starting with merging, and instead running post-run QC right at the start.

Here is an example samplesheet:

sample,fastq_1,fastq_2,genome
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994983_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994984_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994985_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994986_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994987_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994988_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7431885_NB18_32F_TNTtoKSR_553_rep1,s3://filebucket/SRR24757836_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,s3://filebucket/SRR24757836_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R2.fastq.gz,
GSM7431885_NB18_32F_TNTtoKSR_553_rep1,s3://filebucket/SRR24757837_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,s3://filebucket/SRR24757837_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R2.fastq.gz,

Command used and terminal output

This is the command it attempts to run:

Command

[ ! -f  GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz ] && ln -s SRR24994983_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994984_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994985_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994986_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994987_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994988_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz
trim_galore \
    --fastqc \
    --cores 8 \
    --gzip \
    GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz

Terminal output of Trim Galore process:
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
11:11AM INF shutdown filesystem start
11:11AM INF shutdown filesystem done

Relevant files

No response

System information

Screenshot 2024-03-27 at 11 06 51

I am running this on Seqera platform on AWS, using Fusion. Nextflow v23.10.1 build 5891. nf-core/methylseq version: dev

edmundmiller commented 6 months ago

Have you tried simplifying the names to just GSM7431885 and GSM7506206

The sample name doesn't have to match the original input, you can name it something more descriptive than an ID as well.

FelixKrueger commented 6 months ago

Simplifying the name has no effect (other than a different file name...):

Screenshot 2024-03-28 at 11 02 08
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists

But this command can never work:

ln -s SRR24994983_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994984_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994985_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994986_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994987_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994988_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
GSM7506206.fastq.gz
edmundmiller commented 6 months ago

using Fusion

I have a feeling this might be it, with the soft links for whatever weird reason.

Two new experiments:

  1. Can you run the methylseq test profile in the environment?
  2. Can you run the rnaseq test profile in the environment? (It has trimgalore)
  3. If the above two work, what about a rnaseq test full?

Also, any previous versions confirmed? Because the trimgalore module hasn't been updated in 11 months.

FelixKrueger commented 6 months ago

It also fails with 2.6.0:

ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists

and the process doesn't start at all with 2.5.0 (as it expected filenames to contain at least one _ underscore back then:

Execution completed unsuccessfully!

The full error message was:

fromIndex = -1