nygenome / lancet

Microassembly based somatic variant caller for NGS data
Other
153 stars 33 forks source link

use lancet for specific indels #11

Closed crazyhottommy closed 7 years ago

crazyhottommy commented 7 years ago

Hi, Thanks for this tool! I have two tumor samples with normal controls: pre-treatment and post-treatment

lancet --min-alt-count-tumor 3 --max-alt-count-normal 1 --min-vaf-tumor 0.04 --max-vaf-normal 0.01 --ref my.fa --bed my.bed -t pre.bam -n normal.bam

lancet --min-alt-count-tumor 3 --max-alt-count-normal 1 --min-vaf-tumor 0.04 --max-vaf-normal 0.01 --ref my.fa --bed my.bed -t post.bam -n normal.bam

I found a deletion in pre-treatment but not in post-treatment sample.

chr      pos           ID    ref                                                       alt    Qual      Filter
7       55242465  .   GGAATTAAGAGAAGCA                          G   68.1397  PASS 

I want to check maybe in the post-treatment sample, the tumor vaf is just too low (< 0.04) and was rejected.

so I lower the cut-off of tumor vaf to 0.0005 and want to see if there are any evidences:

lancet --min-alt-count-tumor 1 --max-alt-count-normal 1 --min-vaf-tumor 0.0005 --max-vaf-normal 0.01 --ref my.fa --reg 7:55242464-55242465 -t pre.bam -n normal.bam

lancet --min-alt-count-tumor 1 --max-alt-count-normal 1 --min-vaf-tumor 0.0005 --max-vaf-normal 0.01 --ref my.fa --reg 7:55242464-55242465 -t pre.bam -n normal.bam

it output no mutations even for the pre-treatment sample.

How should I correctly use the --reg? Is it 0 based like bed file?

Thank you! Tommy

====UPDATE====

if I increase the window 100bp on both sides, the deletion was detected.

lancet --min-alt-count-tumor 2 --max-alt-count-normal 1 --min-vaf-tumor 0.0005 --max-vaf-normal 0.0001 --reg 7:55242364-55242565 --ref my.fa -t pre.bam -n normal.bam

what size of the window should I use then? what if I only want the supporting evidence on that specific deletion position rather than a 200bp window centered on the deletion position.

gnarzisi commented 7 years ago

Hi Tommy,

glad to hear that you figured out the window size issue!

Due to the local-assembly procedure employed by Lancet, a window size centered around the mutation is always required in order to extract all the reads that have start and end position within the region. I would use a larger value, e.g., 300bp on both sides, so that you will include most of the read pairs (I am assuming here an average fragment size of 500bp, which is quite common today).

crazyhottommy commented 7 years ago

thanks very much for the confirmation!

For my usage case, I want to have the read supporting evidence (how many reads) for a list of confident calls I have across all the samples from a single patient, even a certain mutation is called by lancet in only one of the samples. Is it possible to have it from lancet?

e.g.

for the one has mutation

chr      pos           ID    ref                                                       alt  ......
7       55242465  .   GGAATTAAGAGAAGCA                         G

for the one does not have even a single supporting reads

chr      pos           ID    ref                                                       alt    .....
7       55242465  .   G                                                             G

I used to use samtools mpileup to get the information, but for indels, I am not sure how to do it.

Thanks for your help!

Tommy

gnarzisi commented 7 years ago

Lancet currently produces genotype info only for positions where the sample and the reference are different. Non-variant positions, typically reported in a gVCF file, are not currently supported. Sorry for the bad news.

However, if you run Lancet in verbose mode (-v option), all the assembled sequences together with the supporting coverage across them will be also produced to stderr. Given the high level of verbosity, only use this option for small regions centered around the variants. Parsing this output is not friendly, but if your sample has no variant within the region, you should see only one assembled sequence matching the reference genome.

crazyhottommy commented 7 years ago

I will try to avoid parsing the verbose output:) I found bam-readcount , I just need to parse that output a bit.

but if lancet can give that information, it will be very convenient. For tumor clonality analysis, one has to get the allele frequencies for the same position for all samples from the same patient no matter there is a mutation or not called.

if not, just place 0, but I need to know from the bam files:)

Thanks! Tommy

crazyhottommy commented 6 years ago

Hi @gnarzisi sorry to open this again,

For my usage case, I want to have the read supporting evidence (how many reads) for a list of confident calls I have across all the samples from a single patient, even a certain mutation is called by lancet in only one of the samples. Is it possible to have it from lancet?

I was using bamreadcount, and it works for SNVs and small indels, however, when the indels are large > 100bp, bamreadcount can not find any reads to support that. bamreadcount is not doing local assembly, so it is understandable to miss the reads.

It is still ideally to have lancet output that information. If you have time, could you please take a look?

Thanks! Tommy

gnarzisi commented 6 years ago

Hi Tommy,

yes, bamreadcount works well only on small indels indeed. Thank you for the reminder about this feature request. I have added it to the to-do list so that it will be included in a future release.

crazyhottommy commented 6 years ago

Thank you very much and look forward the new feature :)