Open joybio opened 2 years ago
It's a tab-delimited file, as described here: https://github.com/iqbal-lab-org/minos/wiki/Minos-for-joint-genotyping
thanks for the links! but I got another error as followed:
Command error:
Traceback (most recent call last):
File "/share/data3//miniconda3/envs/minos/bin/minos", line 8, in
name reads vcf 201550 /home/NGS/201550.dup.bam /home/NGS/201550.gvcf.gz ... ###############################################
Is it a lot of samples? If possible could you share the VCF files with me please and I can debug
it's about 1000 vcf files. the following text is the head of vcf files (all like this). #################################################################
...
NZ_CP042314.1 1 . C
i'm reading this function (find_parallel_cluster_split_points) , what information are we processing in this function.
Does the str "
thanks!
I've just tested and having <NON_REF>
for an allele will break the pipeline later on (the gramtools_build
stage). I expect bad things will also be happening internally before the actual failure.
There's also two nextflow parameters that may be causing your VCFs to be excluded. Their defaults made sense for TB, but maybe not for your data.
--max_variants_per_sample
. Default is 5000. If a VCF has more records than this, then it is excluded (on the basis that it must be too far away from the reference genome)--max_genome_proportion_per_sample
. Default is 0.05. This is saying (approximately) that if the sample is more than 5% away from the reference, then it is excluded. The actual test is: (total length of REF alleles) / (genome size) > 0.05.Sorry, these should be documented - is on my to do list.
thank you very much!
what should i do with "
I'll add in a step that removes all non-ACGT alleles. You could wait for that fix, or remove the <NON_REF>
alleles from your VCF files.
can i reset these two params? or just add these two params in the nextflow command lines?
I've just tested and having
<NON_REF>
for an allele will break the pipeline later on (thegramtools_build
stage). I expect bad things will also be happening internally before the actual failure.There's also two nextflow parameters that may be causing your VCFs to be excluded. Their defaults made sense for TB, but maybe not for your data.
--max_variants_per_sample
. Default is 5000. If a VCF has more records than this, then it is excluded (on the basis that it must be too far away from the reference genome)--max_genome_proportion_per_sample
. Default is 0.05. This is saying (approximately) that if the sample is more than 5% away from the reference, then it is excluded. The actual test is: (total length of REF alleles) / (genome size) > 0.05.Sorry, these should be documented - is on my to do list.
thanks, i'll wait for the update.
I'll add in a step that removes all non-ACGT alleles. You could wait for that fix, or remove the
<NON_REF>
alleles from your VCF files.
can i reset these two params? or just add these two params in the nextflow command lines?
Add them to the nextflow run
command, like nextflow run --max_variants_per_sample 1000000 ...etc
.
ok.
can i reset these two params? or just add these two params in the nextflow command lines?
Add them to the
nextflow run
command, likenextflow run --max_variants_per_sample 1000000 ...etc
.
hi. i used 1 vcf file as input, replace("
can you share the vcf file?
#
NZ_CP042314.1 1 . C . . . END=392 GT:DP:GQ:MIN_DP:PL 0:0:0:0:0,0 NZ_CP042314.1 393 . T . . . END=399 GT:DP:GQ:MIN_DP:PL 0:4:99:4:0,135 NZ_CP042314.1 400 . C . . . END=403 GT:DP:GQ:MIN_DP:PL 0:4:90:4:0,90 NZ_CP042314.1 404 . G . . . END=404 GT:DP:GQ:MIN_DP:PL 0:4:45:4:0,45 NZ_CP042314.1 405 . A . . . END=1199 GT:DP:GQ:MIN_DP:PL 0:0:0:0:0,0 NZ_CP042314.1 1200 . A . . . END=1209 GT:DP:GQ:MIN_DP:PL 0:1:42:1:0,42 NZ_CP042314.1 1210 . C . . . END=1228 GT:DP:GQ:MIN_DP:PL 0:7:45:7:0,45 NZ_CP042314.1 1229 . G . . . END=1231 GT:DP:GQ:MIN_DP:PL 0:7:0:7:0,0 NZ_CP042314.1 1232 . G . . . END=1233 GT:DP:GQ:MIN_DP:PL 0:7:45:7:0,45 NZ_CP042314.1 1234 . G . . . END=1237 GT:DP:GQ:MIN_DP:PL 0:7:0:7:0,0 NZ_CP042314.1 1238 . G . . . END=1239 GT:DP:GQ:MIN_DP:PL 0:7:45:7:0,45 NZ_CP042314.1 1240 . C . . . END=1243 GT:DP:GQ:MIN_DP:PL 0:7:0:7:0,0 NZ_CP042314.1 1244 . G . . . END=1244 GT:DP:GQ:MIN_DP:PL 0:7:45:7:0,45 NZ_CP042314.1 1245 . G . . . END=1245 GT:DP:GQ:MIN_DP:PL 0:7:0:7:0,0 NZ_CP042314.1 1246 . G . . . END=1248 GT:DP:GQ:MIN_DP:PL 0:7:45:7:0,45 NZ_CP042314.1 1249 . T . . . END=1250 GT:DP:GQ:MIN_DP:PL 0:7:0:7:0,0 NZ_CP042314.1 1251 . A . . . END=1253 GT:DP:GQ:MIN_DP:PL 0:7:45:7:0,45 NZ_CP042314.1 1254 . C . . . END=1255 GT:DP:GQ:MIN_DP:PL 0:7:0:7:0,0 NZ_CP042314.1 1256 . C . . . END=1256 GT:DP:GQ:MIN_DP:PL 0:7:45:7:0,45 NZ_CP042314.1 1257 . A . . . END=1257 GT:DP:GQ:MIN_DP:PL 0:7:0:7:0,0 NZ_CP042314.1 7892 . C G,. 0.01 . BaseQRankSum=0.000;DP=2;MLEAC=0,0;MLEAF=0.00,0.00;MQRankSum=-0.674;RAW_MQandDP=5200,2;ReadPosRankSu m=0.674 GT:AD:DP:GQ:PL:SB 0:1,1,0:2:0:0,0,45:1,0,0,1 NZ_CP042314.1 7893 . C . . . END=7893 GT:DP:GQ:MIN_DP:PL 0:2:72:2:0,72 NZ_CP042314.1 7894 . G GT,. 0 . BaseQRankSum=0.000;DP=2;MLEAC=0,0;MLEAF=0.00,0.00;MQRankSum=-0.674;RAW_MQandDP=5200,2;ReadPosRankSu m=0.674 GT:AD:DP:GQ:PL:SB 0:1,1,0:2:0:0,0,45:1,0,0,1
############################################
there is only 1 difference: the character "
I think the cause is that all of the records are getting ignored either because there's no GT
field, or because the GT
calls are reference, not alt. I've added a description of which alleles in the input get used here:
https://github.com/iqbal-lab-org/minos/wiki/Minos-in-single-sample-mode#input-vcf-files
I've also made a toy data set so you can test the installation. Instructions are here: https://github.com/iqbal-lab-org/minos/wiki/Minos-test-data
You'll need to update to the new release 0.12.4 to get the toy data, and that version will also handle the <xyz>
style alleles.
error messages (test data): 1, gramtools build: error: the following arguments are required: -o/--gram_dir
i have figured it out.
line 113 in nextflow/regenotype.nf.
--gram-dir ==> --gram_dir
--kmer-size ==> --kmer_size
#############################################################
2, [4f/75d978] process > parse_manifest [100%] 1 of 1 ✔
[c2/b9a58e] process > make_vcf_for_gramtools:vcf_merge [100%] 1 of 1 ✔
[41/e27f38] process > make_vcf_for_gramtools:vcf_cluster [100%] 1 of 1 ✔
[0d/109185] process > gramtools_build [100%] 1 of 1 ✔
[1d/1ce400] process > minos (1) [100%] 2 of 2, failed: 2 ✔
[- ] process > make_per_sample_vcfs_dir -
[- ] process > ivcf_merge_chunks -
[- ] process > ivcf_final_merge -
[f1/634237] NOTE: Process minos (2)
terminated with an error exit status (1) -- Error is ignored
[1d/1ce400] NOTE: Process minos (1)
terminated with an error exit status (1) -- Error is ignored
[minos 12-08-2022 17:47:26 INFO] Depencency check: vcfbreakmulti Unknown vcfbreakmulti [minos 12-08-2022 17:47:26 INFO] Depencency check: vcfallelicprimitives Unknown vcfallelicprimitives [minos 12-08-2022 17:47:26 INFO] Depencency check: vcfuniq Unknown vcfuniq [minos 12-08-2022 17:47:26 INFO] Depencency check: vt NOT_FOUND /share/data1/*/soft/miniconda3/envs/minos/bin/vt
I'm sure that vt is ok. what should I do?
another question:
what should I do when GT calls are reference, not alt.
########################################################
my vcf:
NZ_CP042314.1 1 . C . . . END=392 GT:DP:GQ:MIN_DP:PL 0:0:0:0:0,0
NZ_CP042314.1 393 . T . . . END=399 GT:DP:GQ:MIN_DP:PL 0:4:99:4:0,135
your test:
ref_name 100 . C A . . . GT 0/0
ref_name 101 . C
The vt
line in the log looks strange: I wouldn't expect a *
in that path. What does which vt
return for you?
For ref calls - you'll need to change the GT to be allele that you want considered for genotyping. But also in your VCF the ALT column has .
- this will need to be changed to a nucleotide sequence.
comma or tab separated? how about pairwise sequencing? can I use compressed files, like gz or tar.gz?
here is an example of my manifest.tsv, is this ok?
######################################################################### name vcf reads reads 201550 NGS_Mapping/SNP/gvcf/201550.gvcf.gz 01.Cleandata/201550/201550.1.fq.clean.gz 01.Cleandata/201550/201550.2.fq.clean.gz 183207 NGS_Mapping/SNP/gvcf/183207.gvcf.gz 01.Cleandata/183207/183207.1.fq.clean.gz 01.Cleandata/183207/183207.2.fq.clean.gz #########################################################################
thanks.