Open dariober opened 6 years ago
Feel free to bombard away since it could help to improve the package.
You may not want to filter away all the lcn.em=NA segments. There can be clear focal amplifications i.e. large tcn.em but due to the sparsity of het SNPs we can't determine lcn unambiguously.
cnlr.median is the segment specific estimate. For the clust suffix segments that are similar are combined and given a cluster specific estimate. So they ought to be very similar. cnlr.median by itself is not informative since it depends on library size, ploidy etc. As you said the right quantity to use is cnlr.median - dipLogR.
Venkat
...Following this up... What about segments where tcn.em=2 and lcn.em=0?
This could be neutral segments if both tumour and normal are homozygote (hence lcn.em=0), so they could be filtered out. However, some of these could be loss-of-heterozygosity segments where the normal does have lcn.em=1 (instead cases where tcn.em != 2 and lcn.em=0 are definitely interesting).
I think to distinguish which case we have one needs to look at mafR.clust
and see whether it is close to zero (neutral segment) or >0 (LOH). Now, I guess deciding when mafR.clust is convincingly > 0 may be tricky as it depends also on the number of markers in the segment (few markers are more likely to generate spurious deviations from 0). Any thoughts...?
Dario
A segment that has tcn=2 and lcn=0 is always copy neutral LOH because in order to assign lcn the segment should have had enough het snps in the normal. If it didn't lcn assignment would have been NA. You can question whether min.nhet is large enough.
Venkat
Indeed, that makes sense. Let's try to recapitulate. Would you agree with this classification?
tcn.em | lcn.em | type |
---|---|---|
0 | 0 | DEL |
0 | NA | DEL |
1 | 0 | HEMIZYGOTE |
1 | NA | HEMIZYGOTE |
2 | 0 | LOH |
2 | 1 | NEUTR |
2 | NA | NEUTR/Unknown |
2+ | 0 | DUP-LOH |
2+ | 1+ | DUP |
2+ | NA | DUP-[LOH?] |
Yes except that tcn=1 automatically implies lcn=0. So (1,NA) shouldn't occur (if it does it's a bug).
Hi - Thanks! I have a few cases where tcn=0 and lcn=NA, 9 out of 8486 (cumulated on several samples). These are the records, if useful:
chrom start end seg num.mark nhet segclust cnlr.median.clust mafR.clust cf.em tcn.em lcn.em sample_id
chrX 29133 2780079 171 3999 1035 20 -0.188145471 0.105767213 1 1 NA 276
chrX 2780409 31879100 172 26156 31 15 -0.320095316 NA 1 1 NA 276
chrX 31961182 58438972 174 22373 20 15 -0.320095316 NA 1 1 NA 276
chrX 58439460 62366300 175 139 0 19 -0.244809231 NA 1 1 NA 276
chrX 67532300 68324978 177 754 0 15 -0.320095316 NA 1 1 NA 276
chrX 73961253 75429969 179 1357 0 15 -0.320095316 NA 1 1 NA 276
chrX 155704979 156030162 181 603 131 8 -0.385180248 -0.001746385 1 1 NA 276
chrX 21457 2781330 34 4363 1489 9 0.009380662 -0.001925520 1 1 NA 303
chrX 155703951 156030162 36 622 79 10 0.009380662 0.003004941 1 1 NA 303
Thanks. Will fix the bug.
I have a quick question regarding the ploidy/purity output.
What does it mean when it says "75%" such as in "
Thank you!
Sorry, I'm bombarding you with questions... I'd like to make sure my understanding is correct.
Typically one wants to filter out segments that are copy-number neutral. For this purpose, is it sufficient to filter out records where tcn.em = 2 or NA and lcn.em = 1 or NA? Is any other filtering recommended? E.g.:
Related question, In order to assess the "strength" or enrichment of a segment I guess a useful metric is
cnlr.median
orcnlr.median.clust
(these two appear to be almost perfectly correlated, so I guess it doesn't matter which one is chosen, any comment on this?) However, the rawcnlr.median
should be adjusted for the logR of the diploid state (found infit$dipLogR
). Otherwise, with high or low ploidy the rawcnlr.median
may be misleading.For example, in the results plotted below segments with logR ~ 0 are still duplicated, so a useful metric to assess enrichment becomes:
fit$cncf$cnlr.median - fit$dipLogR
. Do you agree with this?Thanks a lot again! Dario