nanopore-wgs-consortium / NA12878

Data and analysis for NA12878 genome on nanopore
Other
372 stars 93 forks source link

Variants in High Confidence VCF and FASTQ Output have 0 intersection #108

Closed hkarakurt closed 2 years ago

hkarakurt commented 2 years ago

Hello everyone, To test a SNP calling pipeline I used a FASTQ. Due to storage problems I could not download FAST5 files. Basically I downloaded FAB39088-288418386 via command:

aws s3 cp aws s3 cp s3://nanopore-human-wgs/rel6/FASTQTars/FAB39088-288418386_Multi.tar .

I merged all FASTQ files with "cat" command to obtain a single FASTQ of all reads.

I aligned the reads with minimap2 via command:

minimap2 -ax map-ont -t 15 /home/references/hg19/hg19.fa /home/na12878/Notts/FAB39088-288418386_Multi/fastq/na12878_all.fastq --MD > mapped_12878.sam

I used hg19 genome from (I use hg19 from : http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/hg19.fa.gz):

I sorted and indexed the SAM file with samtools.

I do not have FAST5 files, so I could not index reads with nanopolish index. So I used longshot for variant calling via command:

longshot --bam indexed_sorted_mapped.bam --ref /home/bioinformatic/references/hg19/hg19.fa --out longshot_result.vcf

Final VCF has about 43000 variants (including Y chromosome, I believe NA12878 sample is coming from a female person) and I compared it with the high confidence VCF from:https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv3.3.2/GRCh37/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz
but I have 0 variants shared between two VCF files.

I am not sure what is the problem? If anyone can give me an advice or tell my mistake I would be so grateful.