tgen / LumosVar

MIT License
11 stars 0 forks source link

Input data preparation #2

Open namsyvo opened 6 years ago

namsyvo commented 6 years ago

Hi, I'm testing LumosVar on some cancer types to see if we can use it in our internal GDC pipeline. I have some questions:

rhalperin commented 6 years ago

I haven't yet done a systematic evaluation of how many normal bams are needed, but I typically try to use at least 10. These should be from non-tumor tissue sequencing. It doesn't matter if they are the same type of cancer or even if they are cancer patients. It is important that the sequencing of the normal bams was done using the same exome capture bait set since they are also used for comparing read depths for copy number. I have not shared a standard normal metrics file because there are so many different exome capture sets. If you are working with data from a commercially available exome capture, and would like to let me know, I can check whether I may already have analyzeControls run on bams from that capture

I realize that the analyzeControls step is painfully slow. We are currently working on a C implementation that will be much faster. We already have the slow part of the main caller rewritten in C, which is currently on a private github repo. If you are interested in beta testing, I would be happy to add you as a collaborator.

namsyvo commented 6 years ago

Thank you for your very detailed answer. About exome capture, most of data we have are from Nimblegen EZ Exome v3.0 or (v2.0). If you have analyzeControls run for such kits, please let me know.

Yes, I'm very happy to be added as a collaborator to test your new version. I got some errors with current version on some data sets, so I also would like to see if the new version can overcome them or not. Thanks for your suggestion.

namsyvo commented 6 years ago

Hi, I'm trying to run analyzeControls with a list of control bam files, but I got errors. If I run analyzeControls on each bam file, it run well, but if if I put 2 files together, analyzeControls crashes. Could you look at the error message in attached file to see what happens? Thanks. log_err.txt

rhalperin commented 6 years ago

Hi,

No, I don't have analyzeControls run on any Nimblegen exomes.

I apologize for the delayed response. I just finished a round of improvements/debugging on the new version. I added you as a collaborator on those repositories.

I haven't seen that error before. It looks it isn't correctly grabbing the reference base from the mpileup output. What version of samtools are you using?

Rebecca

namsyvo commented 6 years ago

Hi Rebecca,

I used samtools 0.9 in that experiments. Based on your answer, I think the problem is samtools version and I tried samtools 1.7, the latest version, and it works now! I finished running analyzeControls with 10 normal bam files, but it takes so long time to finish. Hope your newer version can deal with that. Thanks.

Nam