kishwarshafin / pepper

PEPPER-Margin-DeepVariant
MIT License
245 stars 42 forks source link

error No valid VCF entries found after PEPPER SNP calling #116

Closed myonaung closed 2 years ago

myonaung commented 2 years ago

@kishwarshafin Thanks for pepper_deepvariant_test-r0.7.sif. It is working with example dataset on our HPC. However, when I replaced with my reads and reference, I have seen a few different issues. Note that I have subset the run into a specific region with --region argument.

time margin phase /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//3D7_BC01.fastq_sorted.minimap2.bam /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//PlasmoDB-46_Pfalciparum3D7_Genome.fasta /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//output/PEPPER_VARIANT_FULL.vcf.gz /opt/margin_dir/params/phase/allParams.haplotag.ont-r94g507.snp.json -t 1 -r Pf3D7_03_v3:221323-222516 -V -o /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//output/PHASED.PEPPER_MARGIN 2>&1 | tee /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//output/logs/2_margin_haplotag.log;
samtools index -@1 /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//output/PHASED.PEPPER_MARGIN.haplotagged.bam
-------
Running OpenMP with 1 threads.
> Parsing model parameters from file: /opt/margin_dir/params/phase/allParams.haplotag.ont-r94g507.snp.json
> Parsed 8 total VCF entries from /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//output/PEPPER_VARIANT_FULL.vcf.gz; kept 0 HETs, skipped 0 for region, 7 for not being PASS, 1 for being homozygous, 0 for being INDEL
No valid VCF entries found!

real    0m0.052s
user    0m0.004s
sys     0m0.048s
[E::hts_open_format] Failed to open file "/data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//output/PHASED.PEPPER_MARGIN.haplotagged.bam" : No such file or directory
samtools index: failed to open "/data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//output/PHASED.PEPPER_MARGIN.haplotagged.bam": No such file or directory
[12-16-2021 10:47:36] ERROR: None]
[12-16-2021 10:47:36] THE FOLLOWING COMMAND FAILED: time margin phase /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//3D7_BC01.fastq_sorted.minimap2.bam /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//PlasmoDB-46_Pfalciparum3D7_Genome.fasta /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//output/PEPPER_VARIANT_FULL.vcf.gz /opt/margin_dir/params/phase/allParams.haplotag.ont-r94g507.snp.json -t 1 -r Pf3D7_03_v3:221323-222516 -V -o /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//output/PHASED.PEPPER_MARGIN 2>&1 | tee /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//output/logs/2_margin_haplotag.log;
samtools index -@1 /data/gpfs/projects/punim1129/probe-cap_seq_data/tools/pepper-deepvariant/mock_data//output/PHASED.PEPPER_MARGIN.haplotagged.bam]

Is it something have you seen before ? Is it simply because of no variant has passed through the filter for the specified region ?

kishwarshafin commented 2 years ago

@myonaung ,

This would mean you have zoomed into a homozygous region Pf3D7_03_v3:221323-222516 and the initial SNP finding was not able to find anything in this region. PEPPER depends on phasing, maybe you can run on -r Pf3D7_03_v3 and see if it goes away?

myonaung commented 2 years ago

Ok, make sense. I can't see anything on the region and try with a different sample, and works well. One more question, how do we need to deal with amplicon data which mapped to whole genome ref. Do I need to keep specify with --region argument ? Maybe I didn't see it but cannot find the option to use BED-coordinate. Or does it simply skip the region with no reads ?

kishwarshafin commented 2 years ago

@myonaung ,

We have not trained our models on amplicon data so we can't tell you exactly how good the performance will be. But in terms of running it, yes, the pipeline will ignore regions where no reads map.

myonaung commented 2 years ago

@kishwarshafin Thanks, I will close the issue for now.