Steps to reproduce variants not called by RUFUS that are in the gold-standard data set:
(Notes from S. Gardiner)
Files to reproduce:
So, I've been looking at the RUFUS run from the merged bams of EA/NC/LL with this file path:
/scratch/ucgd/lustre-work/marth/u0880188/smaht/hcc1395_seqc2/merged_runs/EA_NC_LL
How to identify variants that do have contigs made, but don't have calls in vcf:
But, the way I did it previously was I ran bedtools coverage on this file that contains contigs:
/scratch/ucgd/lustre-work/marth/u0880188/smaht/hcc1395_seqc2/merged_runs/EA_NC_LL/EA_NC_LL_3_merged_tumor.bam.generator.V2.overlap.hashcount.fastq.bam at the specific sites from the validated vcf that RUFUS failed to call.
This gave me this file: /scratch/ucgd/lustre-work/marth/u0880188/smaht/hcc1395_seqc2/merged_runs/EA_NC_LL/contig_depth.txt
Where i then just used a python script to pull out locations that had at least a coverage of 1.
Here is a tsv file of that:
/scratch/ucgd/lustre-work/marth/u0880188/smaht/hcc1395_seqc2/merged_runs/EA_NC_LL/contig_variants_validated.tsv
Steps to reproduce variants not called by RUFUS that are in the gold-standard data set: (Notes from S. Gardiner)
Files to reproduce:
So, I've been looking at the RUFUS run from the merged bams of EA/NC/LL with this file path:
/scratch/ucgd/lustre-work/marth/u0880188/smaht/hcc1395_seqc2/merged_runs/EA_NC_LL
How to identify variants that do have contigs made, but don't have calls in vcf:
But, the way I did it previously was I ran bedtools coverage on this file that contains contigs:
/scratch/ucgd/lustre-work/marth/u0880188/smaht/hcc1395_seqc2/merged_runs/EA_NC_LL/EA_NC_LL_3_merged_tumor.bam.generator.V2.overlap.hashcount.fastq.bam
at the specific sites from the validated vcf that RUFUS failed to call. This gave me this file:/scratch/ucgd/lustre-work/marth/u0880188/smaht/hcc1395_seqc2/merged_runs/EA_NC_LL/contig_depth.txt
Where i then just used a python script to pull out locations that had at least a coverage of 1. Here is a tsv file of that:/scratch/ucgd/lustre-work/marth/u0880188/smaht/hcc1395_seqc2/merged_runs/EA_NC_LL/contig_variants_validated.tsv
Looks like there were 1209 variants