copy number profile column in the read count file

markutz556 commented 4 years ago

Hi,

Thanks for developing this interesting tool. I want to try this with our data (scDNA-seq from 10x) but I can't figure out what the copy number profile column in the read count file refers to. To me, each cell could have different copy numbers in different genomic regions. If this number is for those mutations in the same row, does that mean one cell could have multiple rows? But then the cell_id in the read count file should be unique identifier for each cell.

Sorry I'm kind of new to the phylogenetic analysis. I wonder if you have any suggestions or tutorial links on processing 10x scDNA-seq data to generate input files to SCARLET.

Thanks, Mark

gsatas commented 4 years ago

Hi Mark,

Thank you for your interest. In the read count file, the copy-number profile may be interpreted as an identifier for "copy-number clone" or "copy-number cluster". So if you cluster cells according to their copy-number calls, the copy-number profile column indicates which cluster each of these cells belong to. So this column doesn't actually indicate anything about the copy numbers of that region. This information comes in in the copy-number tree file, which provides the set of supported losses. So a mutation is in a region with copy-number 2 in clone A and copy-number 1 in clone B, then in the copy-number tree file, that mutation would be included in the list of supported losses.

SCARLET has been in general designed for and tested on higher coverage data (targetted or whole exome) so it may not perform particularly well on 10x scDNA-seq because of the very high rates of false negatives.

I hope this clarifies things. Gryte

On Tue, Oct 27, 2020 at 11:49 AM markutz556 notifications@github.com wrote:

Hi,

Thanks for developing this interesting tools. I want to try this with our data (scDNA-seq from 10x) but I can't what the copy number profile column in the read count file refers to. To me, each cell could have different copy numbers in different genomic regions. If this number is for those mutations in the same row, does that mean one cell could have multiple rows? But then the cell_id in the read count file should be unique identifier for each cell.

Sorry I'm kind of new to the phylogenetic analysis. I wonder if you have any suggestions or tutorial links on processing 10x scDNA-seq data to generate input files to SCARLET.

Thanks, Mark

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/raphael-group/scarlet/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOIIJ42NC4QZOMWPBJCYOLSM3TWZANCNFSM4TBAT6EQ .

markutz556 commented 4 years ago

Thanks for your answers. Now it makes sense. I also want to ask what would be a good way to generate the copy number tree file. Right now we have a set of clones defined by copy number and also the snv profiles associated with each clone. But we don't know the transitional relationship among clones and I think we need this to build the copy number tree file. Any ideas would be appreciated! Thanks.

Best, Mark

raphael-group / scarlet

copy number profile column in the read count file #1