tk2 / RetroSeq

RetroSeq is a bioinformatics tool that searches for mobile element insertions from aligned reads in a BAM file and a library of reference transposable elements. Please read the wiki page (link below) for usage instructions. Also, there is a page on the wiki describing how the 1000 genomes CEU trio was carried out with the files and parameters used for the various steps.
64 stars 25 forks source link

Separating .vcf's by individual on pooled call? #12

Open lokeyCEU opened 7 years ago

lokeyCEU commented 7 years ago

I have merged .bam files from the 1kGP (with samtools merge -r) and performed RetroSeq discovery phase on the merged .bam.

But now when I call the merged .bam I get only one .vcf output. How do I create .vcf's for each individual in the merged .bam?

This is similar to what Wildschutte did in a 2015 study. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666360/

Thank you.

EDIT: (May 2017) I was mistaken, the merged (pooled) .bam is used during the calling phase NOT discovery.

tk2 commented 7 years ago

Can you provide the command lines you have run?

On 20/04/17 22:01, lokeyCEU wrote:

I have merged .bam files from the 1kGP (with samtools merge -r) and performed RetroSeq discovery phase on the merged .bam.

But now when I call the merged .bam I get only one .vcf output. How do I create .vcf's for each individual in the merged .bam?

This is similar to what Wildschutte did in a 2015 study. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666360/

Thank you.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/wtsi-svi/RetroSeq/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AAf_sWb9Kr-Dmb2idbrGj-FMg0kjWMORks5rx8edgaJpZM4NDkmR.

lokeyCEU commented 7 years ago

Absolutely.

I did samtools -r merge for all individuals from the CEU into a single pooled .bam

Then ran discovery phase; perl /path/to/software/Retroseq/RetroSeq-master/bin/retroseq.pl -discover -bam TestCEU-r.bam -output CEU-r.HERVK.tab -eref HERVKfa.tab -refTEs HERVKbed.tab -align

Then call phase; perl /path/to/software/Retroseq/RetroSeq-master/bin/retroseq.pl -call -bam TestCEU-r.bam -input CEU-r.HERVK.tab -ref hg19.refFIX.fa -output HERVK.TEST-r.vcf -reads 2 -depth 10000

But the .vcf that comes out is all the pooled individuals and I want the call separated by individual.

Thanks!

lokeyCEU commented 7 years ago

UPDATE:

The Wildschutte 2016 paper took these, simplified, steps.

  1. -discover phase on individual .bam's from 1kGP, to produce candidates
  2. merge .bam's by population, with samtools merge
  3. -call phase on merged .bam to produce .vcf Problem is that output .vcf gives insertion presence of all individuals in ONE column. If each individuals insertion presence were in separate columns one could simply use bcftools to separate. Is there something I am missing that will produce .vcf's for each individual, or at least columns by individual, from the merged .bam?

Here is the command I used; nohup perl retroseq.pl -call -bam TestCEU-r.bam -input HERVK*.tab -ref hg19.refFIX.fa -output TestPooledCall.CEU-r.vcf -reads 2 -depth 10000 & NOTE: the -input is a prefix of a series of files all named HERVK(Insert individuals name here).tab, Is this where things have gone awry?

Thanks!