Closed sahuno closed 1 week ago
Hi @sahuno ,
If you want to obtain a phased modcall VCF, could you please confirm whether you ran longphase phase
after completing longphase modcall
?
longphase phase \
-s SNP.vcf \
--mod-file modcall.vcf \
-b alignment.bam \
-r reference.fasta \
-t 8 \
-o phased_prefix \
--ont
Thanks
Thank you @twolinin for the heads up! the output from the command below is phased D-0-1_4000.vcf
longphase phase -s results/call_snps_indels/D-0-1_4000/snv.vcf.gz \
--mod-file results/longphase_modcall/D-0-1_4000/modcall_D-0-1_4000.vcf \
-b /data1/greenbab/projects/triplicates_epigenetics_diyva/DNA/preprocessed/results/mark_duplicates/D-0-1_4000/D-0-1_4000_modBaseCalls_sorted_dup.bam \
-r /data1/greenbab/database/mm10/mm10.fa \
-t 12 \
-o results/longphase_phase/D-0-1_4000/D-0-1_4000 \
--ont
My goal is to do 5mC + SNP co -phasing. However my final haplotagged bam file doesn't have phasing information. i've checked manually and by command
# samtools view /data1/greenbab/projects/triplicates_epigenetics_diyva/DNA/preprocessed/D-0-1_4000_haplotagged_new.bam | awk '{for(i=12;i<=NF;i++) if($i ~ /^HP:|^PS:/) print $i}'
this is command i used
longphase haplotag \
-s s4000/results/call_snps_indels/D-0-1_4000/snv.vcf.gz \
--mod-file s4000/results/longphase_phase/D-0-1_4000/D-0-1_4000.vcf \
-r mm10.fa -b /data1/greenbab/projects/triplicates_epigenetics_diyva/DNA/preprocessed/results/mark_duplicates/D-0-1_4000/D-0-1_4000_modBaseCalls_sorted_dup.bam \
-t 12 -o D-0-1_4000_haplotagged_new
please see snapshot of phased modcall file
cat /data1/greenbab/projects/triplicates_epigenetics_diyva/DNA/preprocessed/snps_longphase_modcalls/s4000/results/longphase_phase/D-0-1_4000/D-0-1_4000.vcf | grep -v "^##" | head
chr19 3079763 . A C 24.5814 PASS H;FAU=6;FCU=4;FGU=0;FTU=0;RAU=5;RCU=1;RGU=0;RTU=0;SB=0.60268 GT:GQ:DP:AF:AD:AU:CU:GU:TU:PS 0|1:24:16:0.3125:11,5:11:5:0:0:3079248
chr19 3086424 . G C 19.2765 PASS FAU=0;FCU=3;FGU=15;FTU=0;RAU=0;RCU=2;RGU=7;RTU=0;SB=0.39155 GT:GQ:DP:AF:AD:AU:CU:GU:TU:PS 0/1:19:27:0.1852:22,5:0:5:22:0:.
chr19 3093019 . C G 18.3241 PASS H;FAU=0;FCU=7;FGU=6;FTU=0;RAU=0;RCU=6;RGU=4;RTU=0;SB=1.0 GT:GQ:DP:AF:AD:AU:CU:GU:TU:PS 1|0:18:23:0.4348:13,10:0:13:10:0:3079248
chr19 3093020 . T C 24.2024 PASS H;FAU=0;FCU=7;FGU=0;FTU=11;RAU=0;RCU=4;RGU=0;RTU=7;SB=1.0 GT:GQ:DP:AF:AD:AU:CU:GU:TU:PS 1|0:24:29:0.3793:18,11:0:11:0:18:3079248
chr19 3103636 . T A 32.2231 PASS H;FAU=5;FCU=0;FGU=0;FTU=5;RAU=3;RCU=0;RGU=0;RTU=8;SB=0.65944 GT:GQ:DP:AF:AD:AU:CU:GU:TU:PS 1|0:32:21:0.3810:13,8:8:0:0:13:3079248
chr19 3109477 . A T 9.7446 LowQual H;FAU=4;FCU=0;FGU=0;FTU=2;RAU=2;RCU=0;RGU=0;RTU=2;SB=0.6044 GT:GQ:DP:AF:AD:AU:CU:GU:TU:PS 0|1:9:14:0.2857:6,4:6:0:0:4:3079248
chr19 3109520 . G A 15.9325 PASS FAU=1;FCU=0;FGU=3;FTU=0;RAU=3;RCU=0;RGU=2;RTU=0;SB=0.03571 GT:GQ:DP:AF:AD:AU:CU:GU:TU:PS 0|1:15:15:0.2667:5,4:4:0:5:0:3079248
chr19 3129778 . A T 31.7292 PASS H;FAU=3;FCU=0;FGU=0;FTU=3;RAU=5;RCU=0;RGU=0;RTU=5;SB=1.0 GT:GQ:DP:AF:AD:AU:CU:GU:TU:PS 0|1:31:16:0.5000:8,8:8:0:0:8:3079248
bash:islogin01:/data1/greenbab/projects/triplicates_epigenetics_diyva/DNA/preprocessed 1025 $
i called somatic variants with
##cmdline=/opt/bin/run_clairs_to --tumor_bam_fn /data1/greenbab/projects/triplicates_epigenetics_diyva/DNA/preprocessed/results/mark_duplicates/D-0-1_4000/D-0-1_4000_modBaseCalls_sorted_dup.bam --ref_fn /data1/greenbab/database/mm10/mm10.fa --threads 12 --platform ont_r10_dorado_sup_4khz --output_dir /data1/greenbab/projects/triplicates_epigenetics_diyva/DNA/preprocessed/snps_longphase_modcalls/s4000/results/call_snps_indels/D-0-1_4000 --ctg_name chr19 --conda_prefix /opt/micromamba/envs/clairs-to
i'm i supposed to phase D-0-1_4000/snv.vcf.gz
separately before using for haplotagging?
did i miss something?
Hi @sahuno , If you want to do 5mC + SNP co-phasing, could you please try
longphase phase -s results/call_snps_indels/D-0-1_4000/snv.vcf.gz \
--mod-file results/longphase_modcall/D-0-1_4000/modcall_D-0-1_4000.vcf \
-b /data1/greenbab/projects/triplicates_epigenetics_diyva/DNA/preprocessed/results/mark_duplicates/D-0-1_4000/D-0-1_4000_modBaseCalls_sorted_dup.bam \
-r /data1/greenbab/database/mm10/mm10.fa \
-t 12 \
-o results/longphase_phase/D-0-1_4000/D-0-1_4000 \
--ont
After phasing , you will get two output files, D-0-1_4000.vcf and D-0-1_4000_mod.vcf Haplotag using the two phased output files.
longphase haplotag \
-s s4000/results/longphase_phase/D-0-1_4000/D-0-1_4000.vcf \
--mod-file s4000/results/longphase_phase/D-0-1_4000/D-0-1_4000_mod.vcf \
-r mm10.fa -b /data1/greenbab/projects/triplicates_epigenetics_diyva/DNA/preprocessed/results/mark_duplicates/D-0-1_4000/D-0-1_4000_modBaseCalls_sorted_dup.bam \
-t 12 -o D-0-1_4000_haplotagged_new
Thanks
your approach worked! thanks!
Hi, i run modcall with modified.bams that have MN & ML tags however the resulting don't have
|
at the genotype position meaning the results were unphased. here's command that i used for modcall and to check phasing. Que: is the output not supposed to be phased? pls let me know if i'm missing any steppls see snapshot of vcf
showing a single read from the modified bam file