Open ybdong919 opened 2 months ago
I was able to reproduce the error.
The error is due to the fact that you've got --genome null --igenomes_ignore
. Then, AFAICT, --snpeff_genome
and --snpeff_db
no longer get set through the igenomes.config
-file. (If that is indeed the case, then I think Sarek should issue a more informative error msg.)
Could you try adding --snpeff_genome GRCh38 --snpeff_db 105
or whichever version of snpeff you want to use in your NF command?
If you can't find any info on this in the docs for Sarek, then we might have to add some info there.
How can I check/list all snpeff db or genome?
I'd check the https://pcingola.github.io/SnpEff/ and https://www.ensembl.org/info/docs/tools/vep/index.html website for it, they have tons of genomes and lots of different versions. We also mirror some of them in https://annotation-cache.github.io/
Why only chr21 is analyzed by freebayes? When I checked the vcf generated by freebayes, I found only chr21 was analyzed, and the line "##commandline="freebayes -f genome.fa --target chr21_1-46709983.bed --min-alternate-fraction 0.1 --min-mapping-quality 1 S1.md.cram" in vcf. Does freebayes only analyze chr21 by default in Sarek? How to let it analyze all chrs?
Why only chr21 is analyzed by freebayes? When I checked the vcf generated by freebayes, I found only chr21 was analyzed, and the line "##commandline="freebayes -f genome.fa --target chr21_1-46709983.bed --min-alternate-fraction 0.1 --min-mapping-quality 1 S1.md.cram" in vcf. Does freebayes only analyze chr21 by default in Sarek? How to let it analyze all chrs?
I had a look at the freebayes-vcf here
s3://nf-core-awsmegatests/sarek/results-5cc30494a6b8e7e53be64d308b582190ca7d2585/test_full_germline_aws/variant_calling/freebayes/NA12878/NA12878.freebayes.vcf.gz
which is from test_full_germline
executed on Sarek v3.4.4 over awsbatch.
The freebayes-vcf contains one ##commandline
tagged line, and it is the following:
##commandline="freebayes -f Homo_sapiens_assembly38.fasta --target chr6_95070791-167591393.bed --min-alternate-fraction 0.1 --min-mapping-quality 1 NA12878.recal.cram"
The pipeline runs freebayes for a bunch of intervals, and the resulting vcf-files then gets merged by the following command:
gatk --java-options "-Xmx3276M -XX:-UsePerfData" \
MergeVcfs \
--INPUT NA12878.chrY_9055175-9057608.gz.sort.vcf.gz --INPUT NA12878.chr12_37235253-37240944.gz.sort.vcf.gz --INPUT NA12878.chr6_95070791-167591393.gz.sort.vcf.gz --INPUT NA12878.chr13_86252980-111703855.gz.sort.vcf.gz --INPUT NA12878.chrX_37285838-49348394.gz.sort.vcf.gz --INPUT NA12878.chr18_47019913-54536574.gz.sort.vcf.gz --INPUT NA12878.chr9_41229379-41237752.gz.sort.vcf.gz --INPUT NA12878.chr2_238904048-242183529.gz.sort.vcf.gz --INPUT NA12878.chr10_39590436-39593013.gz.sort.vcf.gz --INPUT NA12878.chr11_51078349-54425074.gz.sort.vcf.gz --INPUT NA12878.chr1_10001-207666.gz.sort.vcf.gz --INPUT NA12878.chr2_16146120-32867130.gz.sort.vcf.gz --INPUT NA12878.chr4_10001-1429358.gz.sort.vcf.gz --INPUT NA12878.chr17_60001-448188.gz.sort.vcf.gz --INPUT NA12878.chr5_139453660-155760324.gz.sort.vcf.gz --INPUT NA12878.chr20_36314720-64334167.gz.sort.vcf.gz --INPUT NA12878.chr8_44033745-45877265.gz.sort.vcf.gz --INPUT NA12878.chr1_122026460-124977944.gz.sort.vcf.gz --INPUT NA12878.chr4_190173122-190204555.gz.sort.vcf.gz --INPUT NA12878.chr15_20729747-21193490.gz.sort.vcf.gz --INPUT NA12878.chr7_58169654-60828234.gz.sort.vcf.gz \
--OUTPUT NA12878.freebayes.vcf.gz \
--SEQUENCE_DICTIONARY Homo_sapiens_assembly38.dict \
--TMP_DIR . \
The merged vcf-file NA12878.freebayes.vcf.gz
only contains one ##commandline
tagged line, and it is the one mentioned above, but still the merged vcf-file contains variants from all the chromosomes, so I guess MergeVcfs
just includes the ##commandline
from one of the input vcf-files.
Does your published vcf-file from freebayes only contain variants from within the region chr21:1-46709983
?
I'd check the https://pcingola.github.io/SnpEff/ and https://www.ensembl.org/info/docs/tools/vep/index.html website for it, they have tons of genomes and lots of different versions. We also mirror some of them in https://annotation-cache.github.io/
Would you give me more detialed information about where to find a list of genomes?
Why only chr21 is analyzed by freebayes? When I checked the vcf generated by freebayes, I found only chr21 was analyzed, and the line "##commandline="freebayes -f genome.fa --target chr21_1-46709983.bed --min-alternate-fraction 0.1 --min-mapping-quality 1 S1.md.cram" in vcf. Does freebayes only analyze chr21 by default in Sarek? How to let it analyze all chrs?
I had a look at the freebayes-vcf here
s3://nf-core-awsmegatests/sarek/results-5cc30494a6b8e7e53be64d308b582190ca7d2585/test_full_germline_aws/variant_calling/freebayes/NA12878/NA12878.freebayes.vcf.gz
which is from
test_full_germline
executed on Sarek v3.4.4 over awsbatch.The freebayes-vcf contains one
##commandline
tagged line, and it is the following:##commandline="freebayes -f Homo_sapiens_assembly38.fasta --target chr6_95070791-167591393.bed --min-alternate-fraction 0.1 --min-mapping-quality 1 NA12878.recal.cram"
The pipeline runs freebayes for a bunch of intervals, and the resulting vcf-files then gets merged by the following command:
gatk --java-options "-Xmx3276M -XX:-UsePerfData" \ MergeVcfs \ --INPUT NA12878.chrY_9055175-9057608.gz.sort.vcf.gz --INPUT NA12878.chr12_37235253-37240944.gz.sort.vcf.gz --INPUT NA12878.chr6_95070791-167591393.gz.sort.vcf.gz --INPUT NA12878.chr13_86252980-111703855.gz.sort.vcf.gz --INPUT NA12878.chrX_37285838-49348394.gz.sort.vcf.gz --INPUT NA12878.chr18_47019913-54536574.gz.sort.vcf.gz --INPUT NA12878.chr9_41229379-41237752.gz.sort.vcf.gz --INPUT NA12878.chr2_238904048-242183529.gz.sort.vcf.gz --INPUT NA12878.chr10_39590436-39593013.gz.sort.vcf.gz --INPUT NA12878.chr11_51078349-54425074.gz.sort.vcf.gz --INPUT NA12878.chr1_10001-207666.gz.sort.vcf.gz --INPUT NA12878.chr2_16146120-32867130.gz.sort.vcf.gz --INPUT NA12878.chr4_10001-1429358.gz.sort.vcf.gz --INPUT NA12878.chr17_60001-448188.gz.sort.vcf.gz --INPUT NA12878.chr5_139453660-155760324.gz.sort.vcf.gz --INPUT NA12878.chr20_36314720-64334167.gz.sort.vcf.gz --INPUT NA12878.chr8_44033745-45877265.gz.sort.vcf.gz --INPUT NA12878.chr1_122026460-124977944.gz.sort.vcf.gz --INPUT NA12878.chr4_190173122-190204555.gz.sort.vcf.gz --INPUT NA12878.chr15_20729747-21193490.gz.sort.vcf.gz --INPUT NA12878.chr7_58169654-60828234.gz.sort.vcf.gz \ --OUTPUT NA12878.freebayes.vcf.gz \ --SEQUENCE_DICTIONARY Homo_sapiens_assembly38.dict \ --TMP_DIR . \
The merged vcf-file
NA12878.freebayes.vcf.gz
only contains one##commandline
tagged line, and it is the one mentioned above, but still the merged vcf-file contains variants from all the chromosomes, so I guessMergeVcfs
just includes the##commandline
from one of the input vcf-files.Does your published vcf-file from freebayes only contain variants from within the region
chr21:1-46709983
?
Yes, only chr21:1-46709983
Yes, only
chr21:1-46709983
Could you paste the contains of .command.sh
for the MergeVcfs
-job for FREEBAYES here?
Description of the bug
When I use my custom reference, error always show: This path is not available within annotation-cache. Please check https://annotation-cache.github.io/ to create a request for it.
My command is : nextflow run ./sarek -profile singularity --input samplesheet.csv --outdir ./ --tools 'freebayes,snpeff' --genome null --igenomes_ignore --fasta ./ref/hs37d5.fa.gz --skip_tools baserecalibrator
The log:
N E X T F L O W ~ version 24.04.4
Launching
./sarek/main.nf
[distraught_edison] DSL2 - revision: e3d6110e17WARN: Access to undefined parameter
monochromeLogs
-- Initialise it to a default value eg.params.monochromeLogs = some_value
|\ | | /
/ \ |__) |__ } { | \| | \__, \__/ | \ |___ \
-.,--,
.,._,'/ |`-_ \ _
| | \
-| |__
/\ |) | |/ \ | \ / .| /¯¯\ | \ |_ | \ `|____\´nf-core/sarek v3.4.4 ....
Software dependencies https://github.com/nf-core/sarek/blob/master/CITATIONS.md
[- ] NFC…EPARE_GENOME:BWAMEM1_INDEX - [- ] NFC…EPARE_GENOME:BWAMEM2_INDEX - [- ] NFC…E_GENOME:DRAGMAP_HASHTABLE - [- ] NFC…4_CREATESEQUENCEDICTIONARY - [- ] NFC…E_GENOME:MSISENSORPRO_SCAN - [- ] NFC…PARE_GENOME:SAMTOOLS_FAIDX -
[- ] NFC…EPARE_GENOME:BWAMEM1_INDEX - [- ] NFC…EPARE_GENOME:BWAMEM2_INDEX - [- ] NFC…E_GENOME:DRAGMAP_HASHTABLE - [- ] NFC…4_CREATESEQUENCEDICTIONARY - [- ] NFC…E_GENOME:MSISENSORPRO_SCAN - [- ] NFC…PARE_GENOME:SAMTOOLS_FAIDX - [- ] NFC…TABIX_BCFTOOLS_ANNOTATIONS - [- ] NFC…PREPARE_GENOME:TABIX_DBSNP - [- ] NFC…ME:TABIX_GERMLINE_RESOURCE - [- ] NFC…RE_GENOME:TABIX_KNOWN_SNPS - [- ] NFC…_GENOME:TABIX_KNOWN_INDELS - [- ] NFC…K:PREPARE_GENOME:TABIX_PON - [- ] NFC…_INTERVALS:BUILD_INTERVALS - [- ] NFC…RVALS:CREATE_INTERVALS_BED - [- ] NFC…_BGZIPTABIX_INTERVAL_SPLIT - [- ] NFC…ZIPTABIX_INTERVAL_COMBINED - This path is not available within annotation-cache. Please check https://annotation-cache.github.io/ to create a request for it.
Command used and terminal output
Relevant files
No response
System information
No response