lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

Crash with indels from Mutect2 VCFs #19

Closed lima1 closed 6 years ago

lima1 commented 6 years ago

Error in data.frame(seqnames = as.factor(seqnames(x)), start = start(x), : duplicate row.names: rs147304884, rs774418471, rs72106118 Calls: runAbsoluteCN ... eval -> as.data.frame -> as.data.frame -> data.frame

Happens when indels overlap with segment breakpoints

billnjcn111 commented 6 years ago

Hi, Can you quickly fix the bug? I got exactly same error msg for some of my samples. Thanks

lima1 commented 6 years ago

Yes, will be fixed in the developer version in the next 2-3 days.

billnjcn111 commented 6 years ago

Thanks! Also can you keep all the variants in the original vcf file and just annotate each variants with something like "predicted somatic"?

lima1 commented 6 years ago

Yes, that is planned https://github.com/lima1/PureCN/issues/17

PureCN will annotate all variants it does not filter out due to quality or homozygosity, but yes, a future version will keep QC failed and homozygous variants. That hopefully happens in the next 2-3 months so, but for sure after the April Bioconductor release.

billnjcn111 commented 6 years ago

What is the criteria to "Removing 1474 MuTect2 calls due to blacklisted failure reasons"? Thanks

On Thu, Feb 22, 2018 at 3:55 PM, M. Riester notifications@github.com wrote:

Yes, will be fixed in the developer version in the next 2-3 days.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-367819039, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DFFjGbtizYvFmeLE_lJYDKiSFP7Bks5tXdRPgaJpZM4SN0Rr .

lima1 commented 6 years ago

The flags are listed in https://github.com/lima1/PureCN/blob/master/R/filterVcfMuTect2.R

ignore=c("clustered_events", "t_lod", "str_contraction", "read_position", "fragment_length", "multiallelic", "clipping", "strand_artifact")

We don't use Mutect2 yet, so I'm open to change defaults or make it configurable in PureCN.R if necessary. Feel free to open a new issue with your requests for M2 improvements. Thanks a lot!

billnjcn111 commented 6 years ago

Thanks for a quick check. for now I will just change the code on my own.

On Thu, Feb 22, 2018 at 5:45 PM, M. Riester notifications@github.com wrote:

The flags are listed in https://github.com/lima1/PureCN/blob/master/R/ filterVcfMuTect2.R

ignore=c("clustered_events", "t_lod", "str_contraction", "read_position", "fragment_length", "multiallelic", "clipping", "strand_artifact")

We don't use Mutect2 yet, so I'm open to change defaults or make it configurable in PureCN.R if necessary. Feel free to open a new issue with your requests for M2 improvements. Thanks a lot!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-367848635, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DO8yWgoH3qyWQmaxldfGA-1C327Gks5tXe3vgaJpZM4SN0Rr .

lima1 commented 6 years ago

Keep in mind that every artifact that is not filtered out potentially confuses the likelihood model. That’s why we do a thorough filtering. If your pipeline does a good job in filtering, you can make the filtering in PureCN less aggressive, for example by lowering the sequencing error to say half the default. Again, I’m curious to hear more about M2 related issues, so please let me know.

lima1 commented 6 years ago

Should be fixed in the developer version.

billnjcn111 commented 6 years ago

I am testing the develop version (installed using biocLite("lima1/PureCN")) The new issue: Error: logical subscript contains NAs which come after : Setting somatic prior probabilities for dbSNP hits to 0.000500 or to 0.500000 otherwise.

I am not sure if it is the gene symbol issue, since some of the target have no symbol in the interval file Thanks

On Sat, Feb 24, 2018 at 11:51 AM, M. Riester notifications@github.com wrote:

Should be fixed in the developer version.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-368241730, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DFfEqisHAmdHYDH9-kFINMK320wgks5tYD32gaJpZM4SN0Rr .

billnjcn111 commented 6 years ago

Hi M.Riester, I see you have updated the pureCN to 1.9.43 which may caused some issue in my new samples. the version I used about 1 month ago was 1.9.28. I had to reinstall my R so I got this new 1.9.43. The issue now is like following: one sample showed caught segfault address (nil), cause 'memory not mapped'

other 3 samples showed: Error in .correctCoverageBiasLoess(raw) : object 'medDiploid' not found

command line is: Rscript Coverage.R --outdir Mapping --bam 104_recal.bam --intervals V6_intervals.txt --force

Can you look into this issues? Or I can also try the previous version 1.9.28, how can I download it?

Thanks

On Wed, Feb 28, 2018 at 7:57 PM, M. Riester notifications@github.com wrote:

Closed #19 https://github.com/lima1/PureCN/issues/19.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#event-1498019500, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DJMPcAOp2uIgpL4VZtsHXc5b7aQ3ks5tZ0eNgaJpZM4SN0Rr .

lima1 commented 6 years ago

That should be unrelated to any changes between .28 and .43.

What version of R are you using? Is it working for some samples? Are the BAM files ok? If yes, can you share the V6_intervals.txt?

billnjcn111 commented 6 years ago

understand, I am using R 3.4.1. Other samples (most samples) worked fine. What do you mean BAM files ok? BAMs are base quality recalibrated BAMs. How many lines of intervals do you need? Thx

On Mon, Apr 16, 2018 at 5:54 PM, M. Riester notifications@github.com wrote:

That should be unrelated to any changes between .28 and .43.

What version of R are you using? Is it working for some samples? Are the BAM files ok? If yes, can you share the V6_intervals.txt?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-381762686, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DCVgsEcGabrDB1TibZy7Nl7zBpyRks5tpRMqgaJpZM4SN0Rr .

billnjcn111 commented 6 years ago

Also can you post what changes did you make from .28 to .43? Thanks

On Mon, Apr 16, 2018 at 5:54 PM, M. Riester notifications@github.com wrote:

That should be unrelated to any changes between .28 and .43.

What version of R are you using? Is it working for some samples? Are the BAM files ok? If yes, can you share the V6_intervals.txt?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-381762686, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DCVgsEcGabrDB1TibZy7Nl7zBpyRks5tpRMqgaJpZM4SN0Rr .

lima1 commented 6 years ago

You'll have to browse the commit history, I only document the major changes in NEWS.

I suspect the BAM files are empty or corrupted. Looking at the code, this crash most likely happens when the coverage is 0 for all intervals. Is this possible? What does the coverage (not_loess) file look like?

billnjcn111 commented 6 years ago

You are right, everything is 0: Target total_coverage counts on_target duplication_rate 1:36280-65009 0 0 FALSE NA 1:65510-65726 0 0 TRUE NA 1:65777-65972 0 0 TRUE NA 1:69433-69630 0 0 TRUE NA

On Mon, Apr 16, 2018 at 6:18 PM, M. Riester notifications@github.com wrote:

You'll have to browse the commit history, I only document the major changes in NEWS.

I suspect the BAM files are empty or corrupted. Looking at the code, this crash most likely happens when the coverage is 0 for all intervals. Is this possible? What does the coverage (not_loess) file look like?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-381768188, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DBwfb-fopwwiWJw63Skqlygx1F5Pks5tpRi8gaJpZM4SN0Rr .

lima1 commented 6 years ago

Thanks, I’ll add a check for that.

billnjcn111 commented 6 years ago

another issue is that when I count how many variants showed "PureCN.ML.SOMATIC", the current version has way more counts than the previous version. Did you change the cutoff to annotate the PureCN.ML.SOMATIC?

Thanks

On Mon, Apr 16, 2018 at 11:03 PM, M. Riester notifications@github.com wrote:

Thanks, I’ll add a check for that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-381818692, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DF6X1wG8VBMBGydEnthIjpeaIBElks5tpVtrgaJpZM4SN0Rr .

lima1 commented 6 years ago

I don't think so, there were pretty much only minor bugfixes like this since February. The only feature that might affect the ML.SOMATIC cutoff is related to the mapping bias imputation (if you use the mapping_bias.rds file via --normal_panel). But this should only affect a handful of variants in difficult regions.

There were some Mutect2 VCF related changes, but I don't see how they would result in more calls getting through. Possible though.

If you run one of your old samples with the new version, you should be able to quickly tell the difference. Let me know if it looks like a bug.

billnjcn111 commented 6 years ago

Can you send me a link to download 1.9.28? Thanks

On Tue, Apr 17, 2018 at 1:06 PM, M. Riester notifications@github.com wrote:

I don't think so, there were pretty much only minor bugfixes like this since February. The only feature that might affect the ML.SOMATIC cutoff is related to the mapping bias imputation (if you use the mapping_bias.rds file via --normal_panel). But this should only affect a handful of variants in difficult regions.

There were some Mutect2 VCF related changes, but I don't see how they would result in more calls getting through. Possible though.

If you run one of your old samples with the new version, you should be able to quickly tell the difference. Let me know if it looks like a bug.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382069269, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DIhgUueMYEGjNHzTmEF7Le4s0Ys6ks5tpiD6gaJpZM4SN0Rr .

lima1 commented 6 years ago

PureCN_1.9.28.tar.gz

billnjcn111 commented 6 years ago

Thanks!

On Tue, Apr 17, 2018 at 2:19 PM, M. Riester notifications@github.com wrote:

PureCN_1.9.28.tar.gz https://github.com/lima1/PureCN/files/1921080/PureCN_1.9.28.tar.gz

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382092065, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DJ9VS0NvarEIp-ZSsfekV1WxRen0ks5tpjIegaJpZM4SN0Rr .

billnjcn111 commented 6 years ago

For the same input, the new version gave 100 more PureCN.ML.SOMATIC annotation. In addition, each version has its unique variants with PureCN.ML.SOMATIC annotation. Same variants can have PureCN.SM1=0.0011 or PureCN.SM1=0.6482 which caused different prediction. I checked their overage_loess.txt is same as well. For the time being, since the new version gave huge number of variants, I will install the older (.28) version for getting the results to compare. Thanks.

On Tue, Apr 17, 2018 at 3:43 PM, billnjcn billnjcn@gmail.com wrote:

Thanks!

On Tue, Apr 17, 2018 at 2:19 PM, M. Riester notifications@github.com wrote:

PureCN_1.9.28.tar.gz https://github.com/lima1/PureCN/files/1921080/PureCN_1.9.28.tar.gz

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382092065, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DJ9VS0NvarEIp-ZSsfekV1WxRen0ks5tpjIegaJpZM4SN0Rr .

lima1 commented 6 years ago

Can you help me understand what's different? Can you share the Sampleid.log file for both, up to the optimization of local minima?

I assume this is a Mutect2 from GATK4 VCF?

billnjcn111 commented 6 years ago

Yes, both were from GATK4 Mutect2. I don't have both log file now, the previous one was deleted.

On Tue, Apr 17, 2018 at 5:55 PM, M. Riester notifications@github.com wrote:

Can you help me understand what's different? Can you share the Sampleid.log file for both, up to the optimization of local minima?

I assume this is a Mutect2 from GATK4 VCF?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382162098, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DJ1wI4dVJ39IcCv8bd_cPe3BJ6O7ks5tpmTlgaJpZM4SN0Rr .

lima1 commented 6 years ago

100 more in whole exome data? Are these mostly sites in dbSNP? Did you run Mutect2 with gnomAD as described in the GATK4 somatic workflow? The only change that I see might have affected this is the support for POP_AF field. If that is present, it will ignore the DB flag. If the population allele frequencies are wrong though because there is no gnomAD, it might not work well.

lima1 commented 6 years ago

Actually, looking again at the code, if there is a DB flag, it should ignore POP_AF. So I'm puzzled where the differences come from. Any help would be appreciated (btw, the log file gets appended, so unless you manually deleted it, not overwrote, the old logs should be still there).

lima1 commented 6 years ago

Never mind, since you probably used bcftools, you have no DB flag.

It now (most recent commit) should ignore POP_AF when it's not set with a gnomAD value. It should work similar as before, slightly better probably compared to 1.9.28 when you provide a mapping_bias.rds file.

billnjcn111 commented 6 years ago

No, the log file has been output to the stderr or stdout files on the cluster. I usually don't keep them since there are too many.

On Tue, Apr 17, 2018 at 6:52 PM, M. Riester notifications@github.com wrote:

Actually, looking again at the code, if there is a DB flag, it should ignore POP_AF. So I'm puzzled where the differences come from. Any help would be appreciated (btw, the log file gets appended, so unless you manually deleted it, not overwrote, the old logs should be still there).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382182069, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DDhCcQ40BrY7te9hi2Zdaoq3YiEzks5tpnI6gaJpZM4SN0Rr .

billnjcn111 commented 6 years ago

I 'd like to try Mutect 1.7. can you provide the command line for what you have tested with Mutect1.7? Thx

On Tue, Apr 17, 2018 at 10:04 PM, billnjcn billnjcn@gmail.com wrote:

No, the log file has been output to the stderr or stdout files on the cluster. I usually don't keep them since there are too many.

On Tue, Apr 17, 2018 at 6:52 PM, M. Riester notifications@github.com wrote:

Actually, looking again at the code, if there is a DB flag, it should ignore POP_AF. So I'm puzzled where the differences come from. Any help would be appreciated (btw, the log file gets appended, so unless you manually deleted it, not overwrote, the old logs should be still there).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382182069, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DDhCcQ40BrY7te9hi2Zdaoq3YiEzks5tpnI6gaJpZM4SN0Rr .

lima1 commented 6 years ago

Should be all in the main Vignette, faq section.

billnjcn111 commented 6 years ago

I used Mutect1, but there is a bug there: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'seqnames' for signature '"NULL"' Calls: write.csv ... .getArmLocations -> match -> seqnames ->

This is because I used b37 and it does not match the pureCN internal notation. if I put in hg19, then there is another error there saying x can't match y since x is hg19 and y is b37... Can you fix this? thx

On Wed, Apr 18, 2018 at 12:35 AM, M. Riester notifications@github.com wrote:

Should be all in the main Vignette, faq section.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382257931, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DPBfTzv-ZoKG62s278V7FZCbeC5lks5tpsKMgaJpZM4SN0Rr .

lima1 commented 6 years ago

This bug is fixed sometime after 1.9.28.

Where can't you use hg19? Which step failes? Do you use hg19 everywhere?

billnjcn111 commented 6 years ago

I use b37 most of time. For mutect2 results it was fixed already but not for mutect1.1.7 vcf For mutect2, I just use --hg19 in the command line, and it was fine for some reason because you last time fixed it. For mutect1, no matter what I use in the command line, it is not working.

On Wed, Apr 18, 2018 at 2:08 PM, M. Riester notifications@github.com wrote:

This bug is fixed sometime after 1.9.28.

Where can't you use hg19? Which step failes? Do you use hg19 everywhere?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382478115, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DP_4qAAIlV6ud4-yTL5q7plFXb2zks5tp4ESgaJpZM4SN0Rr .

lima1 commented 6 years ago

If you provide "hg19", where does it fail? I need exact error messages and command lines.

Internally it is using "hg19", because all the Bioconductor gene annotation databases are "hg19", there simply is no "b37".

Again, even if you provide "b37" (which is useless because PureCN will ignore it because it's unknown), it should not crash with the recent version 1.9.44.

lima1 commented 6 years ago

And again, please update to 1.9.44, there were a couple of bugfixes related to that.

billnjcn111 commented 6 years ago

Yes, the version change did not affect too much. I will use 1.9.44

On Wed, Apr 18, 2018 at 2:25 PM, M. Riester notifications@github.com wrote:

If you provide "hg19", where does it fail? I need exact error messages and command lines.

Internally it is using "hg19", because all the Bioconductor gene annotation databases are "hg19", there simply is no "b37".

Again, even if you provide "b37" (which is useless because PureCN will ignore it because it's unknown), it should not crash with the recent version 1.9.44.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382483292, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DFGWHKNtNkcIfgb9fakW1v3spy3Gks5tp4UDgaJpZM4SN0Rr .

billnjcn111 commented 6 years ago

BTW, did you validate the contamination values in pureCN? Did you compare it to CalculateContamination in GATK4? I see big difference for the same sample. Now one issue is that I have huge number of mutations called by mutect which could be contamination issue. Do you have any experience on that? thx

On Wed, Apr 18, 2018 at 2:28 PM, billnjcn billnjcn@gmail.com wrote:

Yes, the version change did not affect too much. I will use 1.9.44

On Wed, Apr 18, 2018 at 2:25 PM, M. Riester notifications@github.com wrote:

If you provide "hg19", where does it fail? I need exact error messages and command lines.

Internally it is using "hg19", because all the Bioconductor gene annotation databases are "hg19", there simply is no "b37".

Again, even if you provide "b37" (which is useless because PureCN will ignore it because it's unknown), it should not crash with the recent version 1.9.44.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382483292, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DFGWHKNtNkcIfgb9fakW1v3spy3Gks5tp4UDgaJpZM4SN0Rr .

billnjcn111 commented 6 years ago

Do you mean 1.9.43 which is the most recent version on bioconductor?

On Wed, Apr 18, 2018 at 2:31 PM, billnjcn billnjcn@gmail.com wrote:

BTW, did you validate the contamination values in pureCN? Did you compare it to CalculateContamination in GATK4? I see big difference for the same sample. Now one issue is that I have huge number of mutations called by mutect which could be contamination issue. Do you have any experience on that? thx

On Wed, Apr 18, 2018 at 2:28 PM, billnjcn billnjcn@gmail.com wrote:

Yes, the version change did not affect too much. I will use 1.9.44

On Wed, Apr 18, 2018 at 2:25 PM, M. Riester notifications@github.com wrote:

If you provide "hg19", where does it fail? I need exact error messages and command lines.

Internally it is using "hg19", because all the Bioconductor gene annotation databases are "hg19", there simply is no "b37".

Again, even if you provide "b37" (which is useless because PureCN will ignore it because it's unknown), it should not crash with the recent version 1.9.44.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382483292, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DFGWHKNtNkcIfgb9fakW1v3spy3Gks5tp4UDgaJpZM4SN0Rr .

lima1 commented 6 years ago

1.9.44 has the fix for Mutect2 VCFz and empty POP_AF field (when no gnomAD was used). Which I guess might fix your issue from yesterday.

Haven't compared to GATK4 yet, but works pretty well, yes. It reliably detects contamination, unless the contamination is extreme (it's not made to detect those cases). I will compare in the coming weeks.

The way we define contamination is allelic fraction based, i.e. a contamination of 2% in our likelihood model means SNPs with ~2% are likely contamination.

What's a big difference?

billnjcn111 commented 6 years ago

I meant how did you get your contamination rate?

On Wed, Apr 18, 2018 at 2:43 PM, M. Riester notifications@github.com wrote:

1.9.44 has the fix for Mutect2 VCFz and empty POP_AF field (when no gnomAD was used). Which I guess might fix your issue from yesterday.

Haven't compared to GATK4 yet, but works pretty well, yes. It reliably detects contamination, unless the contamination is extreme (it's not made to detect those cases). I will compare in the coming weeks.

The way we define contamination is allelic fraction based, i.e. a contamination of 2% in our likelihood model means SNPs with ~2% are likely contamination.

What's a big difference?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382488995, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DGv6JMWFPXJ-ewVu5X8tqxPs0AEeks5tp4ligaJpZM4SN0Rr .

lima1 commented 6 years ago

It's part of the likelihood model (described in the paper). A top level explanation you can find in the developer vignette. There are a few initial steps to make it robust to sequencing artifacts, which is important for ultra-deep sequencing. Similar to ContEst, it will look at homozygous sites with a few ref reads, and in addition it checks low allelic sites. It's currently not using population allele frequencies, but it uses purity and copy number.

billnjcn111 commented 6 years ago

For some samples, the somatic mutation load predicted by pureCN anti-correlated with sequencing depth. Do you know why? thx

On Wed, Apr 18, 2018 at 3:52 PM, M. Riester notifications@github.com wrote:

It's part of the likelihood model (described in the paper). A top level explanation you can find in the developer vignette. There are a few initial steps to make it robust to sequencing artifacts, which is important for ultra-deep sequencing. Similar to ContEst, it will look at homozygous sites with a few ref reads, and in addition it checks low allelic sites. It's currently not using population allele frequencies, but it uses purity and copy number.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382508579, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DDygWEN26O9HvFsIynueY8szsJmIks5tp5mRgaJpZM4SN0Rr .

lima1 commented 6 years ago

You are saying you have a few outliers with low sequencing depth that are also outliers with high TMB?

But no, anti-correlation would make little sense.

Do you follow the tutorial in the Quick Start vignette step-by-step? Can you post the command lines you used for IntervalFile.R, PureCN.R and Dx.R?

billnjcn111 commented 6 years ago

Is it possible that this was due to some sort of contamination?Just need some thoughts on this Yes, I followed steps

On Thu, Apr 19, 2018 at 4:18 PM, M. Riester notifications@github.com wrote:

You are saying you have a few outliers with low sequencing depth that are also outliers with high TMB?

But no, anti-correlation would make little sense.

Do you follow the tutorial in the Quick Start vignette step-by-step? Can you post the command lines you used for IntervalFile.R, PureCN.R and Dx.R?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382867492, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DOnB3BwOtOe_98W7yLR35mzB8Yfbks5tqPEHgaJpZM4SN0Rr .

lima1 commented 6 years ago

Contamination is obvious when you look at the main PDF. You should see 2 additional bands of SNPs, close to 0 (alt reads only from contamination) and close to 1 (homozygous SNPs with ref reads from contamination). Are these samples flagged by PureCN for contamination? If not, then unlikely.

Hard to tell without seeing the complete data and code that was used to generate it. Low coverage is in general a QC red flag. Could be extensive DNA damage in old samples with low library complexity.

billnjcn111 commented 6 years ago

Can you tell me which pdf and which plot?

On Thu, Apr 19, 2018 at 4:50 PM, M. Riester notifications@github.com wrote:

Contamination is obvious when you look at the main PDF. You should see 2 additional bands of SNPs, close to 0 (alt reads only from contamination) and close to 1 (homozygous SNPs with ref reads from contamination). Are these samples flagged by PureCN for contamination? If not, then unlikely.

Hard to tell without seeing the complete data and code that was used to generate it. Low coverage is in general a QC red flag. Could be extensive DNA damage in old samples with low library complexity.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382876420, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DDGZlvNSJ6on6CGf4F1e92GusfgIks5tqPiIgaJpZM4SN0Rr .

lima1 commented 6 years ago

The B-allele frequency plot in the main Sampleid.pdf - like Figure 4 in https://bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/PureCN.pdf

billnjcn111 commented 6 years ago

Using 1.9.43, I still have the same error with Mutect1.1.7: Error in mergeNamedAtomicVectors(genome(x), genome(y), what = c("sequence", : sequences 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT have incompatible genomes:

On Thu, Apr 19, 2018 at 7:04 PM, M. Riester notifications@github.com wrote:

The B-allele frequency plot in the main Sampleid.pdf - like Figure 4 in https://bioconductor.org/packages/devel/bioc/vignettes/ PureCN/inst/doc/PureCN.pdf

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/19#issuecomment-382906866, or mute the thread https://github.com/notifications/unsubscribe-auth/AXw6DAkVrcFsgtv0NzPBfR28J5Z22nH8ks5tqRgUgaJpZM4SN0Rr .

lima1 commented 6 years ago

Where is this happening? I would need the complete log output until the crash. You never specified b37 anywhere in PureCN?