Closed Poocee closed 3 months ago
Hey! I believe the samplesheet error is a bit of a false flag, and the actual error is reported further down the log:
nextflow.exception.WorkflowScriptErrorException: Base quality score recalibration requires at least one resource file. Please provide at least one of `--dbsnp` or `--known_indels`
You can skip this step in the workflow by adding `--skip_tools baserecalibrator` to the command.
The genome you picked - hg38
- does not have all the reference files configured and in this case dbsnp
and known_indels
are missing. Can you try to either provide them or skip the baserecalibration step?
I try with skip option but the problem still remain with this message: Pipeline completed with errors- "The sample-sheet only contains normal-samples, but the following tools, which were requested with "--tools", expect at least one tumor-sample : mutect2"
And also with this one: "Missing or unknown field in csv file header. Please check your samplesheet"
Ah I think you are missing the column lane
. If you only have one per sample, you can put 1 for each. I also noticed that your delimiter is ;
. This may work, but I haven't tested it. If the above doesn't work I would try changing the delimiter in your samplesheet to ,
Thank you. Seems that the problem was the lane column (I changed the ";" with "," using vim before running the pipeline). Unfortunately, I still have error with --intervals parameter of my WES analysis. I used the Illumina bed file named hg38_Twist_ILMN_Exome_2.5_Panel_annotated.bed.
This file get an error probably due to the 4th column (I attached a preview of this file, can you please confirm this?) which I have removed naming the new file "prova.bed". However I got this error: ERROR ~ Error executing process > 'NFCORE_SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED (prova)'
Caused by:
Process NFCORE_SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED (prova)
terminated with an error exit status (1)
Command executed:
bgzip --threads 1 -c prova.bed > prova.bed.gz tabix prova.bed.gz
cat <<-END_VERSIONS > versions.yml "NFCORE_SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED": tabix: $(echo $(tabix -h 2>&1) | sed 's/^.Version: //; s/ .$//') END_VERSIONS
Command exit status: 1
Command output: (empty)
Command error: INFO: Converting SIF file to temporary sandbox... [E::hts_idx_push] Unsorted positions on sequence #23: 24364365 followed by 24362683 tbx_index_build failed: prova.bed.gz INFO: Cleaning up image...
Work dir: /share/project3/home/perciostefano/sts/Sarcomics/WES/sequenced/work/a1/44c13a5e6af6f42ef5153394a61255
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
Sorry I have wrongly closed my issue. I have reopened it. I hope this is not a problem
[E::hts_idx_push] Unsorted positions on sequence https://github.com/nf-core/sarek/pull/23: 24364365 followed by 24362683
yes you will need to sort the bedfile. You can use bedtools for this. Other things I have done in the past depending on your analysis is padding the bed file on each side (depending on your read length, 50-100bp or so). This could cause adjacent regions to overlap in which case you also need to merge them.
Here are the commands I used in the past to prepare the bed file:
#!/bin/bash
# sort coordinates
sort -V -k1,1 -k2,2 $1 > ${1/.bed/.sorted.bed}
# GATK recommends padding by 100bp: https://gatk.broadinstitute.org/hc/en-us/articles/360035889551-When-should-I-restrict-my-analysis-to-specific-intervals-
bedtools slop -i ${1/.bed/.sorted.bed} -b 100 -g <genome>.fasta.fai > ${1/.bed/.sorted.padded.bed}
# Merge overlapping or neighboring regions
bedtools merge -i ${1/.bed/.sorted.padded.bed} > ${1/.bed/.sorted.padded.merged.bed}
Thank you very much! According to your suggestion I have changed also the genome reference to the standard GRCh38 and now all the warning about panel of normals and dbsnp and know_indels were solved. I changed also the --intervals with --target_bed since I want to analyze SNV and CNV of my WES data, and I have modified my file .BED as you suggest me. I will let you know if it is all ok!
--target_bed
is not a valid parameter in 3.4.2. You need to supply the bed file through the parameter --intervals
@Poocee has your issue been solved? In that case I would close the issue
Dear Friederike, Yes, my code is running without any problems (at least at the moment ☺). I want to thank you again for your promptness in replying and your availability.
Best regards, Stefano
Da: Friederike Hanssen @.> Inviato: giovedì 8 agosto 2024 12:20 A: nf-core/sarek @.> Cc: Percio Stefano @.>; Mention @.> Oggetto: Re: [nf-core/sarek] Samplesheet of sarek (Issue #1605)
@Pooceehttps://github.com/Poocee has your issue been solved? In that case I would close the issue
— Reply to this email directly, view it on GitHubhttps://github.com/nf-core/sarek/issues/1605#issuecomment-2275468928, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AW32NTLRZSSKGGH6KVYHGMLZQNA63AVCNFSM6AAAAABLUILK3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZVGQ3DQOJSHA. You are receiving this because you were mentioned.Message ID: @.**@.>>
Dear Friederike, unfortunately sarek brokes its pipelines giving an error for which I don't understand what to do. Can you help me please?
Pipeline completed with errors- ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP_SENTIEON:BWAMEM1_MEM (KQ91-1)'
Caused by:
Process NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP_SENTIEON:BWAMEM1_MEM (KQ91-1)
terminated with an error e status (1)
[mem_sam_pe] paired reads have different names: "LH00193:5:22CNKWLT3:7:1101:24587:1048", "LH00193:16:222TJ2LT4:7:1101208:1056"
I attached you the log file file_log.txt
Looks some issue with your fastq file. I found this: https://www.biostars.org/p/254155/ maybe it helps
Unfortunately I have receive these files so I cannot go back up. I check your link but I didn't find any solution to my problem. In addition I don't understanf why the problem looks at only one sample and not to all my fastq files. Thanks in advance
Unfortunately I have receive these files so I cannot go back up. I check your link but I didn't find any solution to my problem.
In that case I would probably try to inspect the files with tools like seqkit
to make sure they are valid fastq files.
In addition I don't understanf why the problem looks at only one sample and not to all my fastq files.
All samples are run in parallel and by distinct jobs. It discovered the issue for one sample (KQ25-1) and reported it.
Dear Friederike, I follow your suggestion and I have used seqkit to modify my data and resolve the problem. Unfortunately, now I have another problem about only one sample that I not understand. I attach the report about the error description. In addition, I don’t understand why the pipeline doesn’t procede even if only one sample gives a mistake (only fastp and fastqc reports are present). Could you please help me. Thankd in advance.
Best, Stefano
Da: Friederike Hanssen @.> Inviato: lunedì 12 agosto 2024 16:42 A: nf-core/sarek @.> Cc: Percio Stefano @.>; Mention @.> Oggetto: Re: [nf-core/sarek] Samplesheet of sarek (Issue #1605)
Unfortunately I have receive these files so I cannot go back up. I check your link but I didn't find any solution to my problem.
In that case I would probably try to inspect the files with tools like seqkit to make sure they are valid fastq files.
In addition I don't understanf why the problem looks at only one sample and not to all my fastq files.
It discovered the issue for one sample (KQ25-1) and reported it.
— Reply to this email directly, view it on GitHubhttps://github.com/nf-core/sarek/issues/1605#issuecomment-2284174125, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AW32NTNZNGPJTQCB4E6I4D3ZRDCU5AVCNFSM6AAAAABLUILK3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBUGE3TIMJSGU. You are receiving this because you were mentioned.Message ID: @.**@.>>
Description of the bug
I am getting an error in the samplesheet composition when running sarek pipelines. I have read many post about this kind of error when runned with different pipelines, but I didn't find anything about a possible solution for my issue. Speciffically the error reports this message: "The sample-sheet only contains normal-samples, but the following tools, which were requested with "--tools", expect at least one tumor-sample : mutect2" Unfortunately, I don't understand what I missed since my samplesheet have both normal and tumor samples (also relapse tumor for some samples), as specified in the usage page.
Command used and terminal output
Relevant files
samplesheet_prova.csv nextflow.log
System information
N E X T F L O W ~ version 24.04.3 HPC Slurm Singularity nf-core/sarek v3.4.2-gb5b766d