navinlabcode / copykat

Other
193 stars 54 forks source link

copykat output files #47

Closed sadiexiaoyu closed 2 years ago

sadiexiaoyu commented 2 years ago

Hi, All, I would like to ask how to interpret the two output files of copykat, i.e. CNA_results.txt, raw_results_gene_by_cell.txt.

In raw_results_gene_by_cell.txt, I can see the abspos, start pos and end pos for each gene in each cell.

1038120 | 1020123 | 1056118 | ENSG00000188157 | AGRN 1099089 | 1081818 | 1116361 | ENSG00000131591 | C1orf159 1263897 | 1253909 | 1273885 | ENSG00000160087 | UBE2J2 1300992 | 1292376 | 1309609 | ENSG00000131584 | ACAP3 1318138 | 1311585 | 1324691 | ENSG00000127054 | INTS11 1342313 | 1335276 | 1349350 | ENSG00000107404 | DVL1 1392519 | 1385711 | 1399328 | ENSG00000221978 | CCNL2 1404610 | 1401908 | 1407313 | ENSG00000242485 | MRPL20

In CNA_results.txt, I can see the abspos across around 200kb intervals. 1042457 | 1042457 1265484 | 1265484 1519859 | 1519859 1826619 | 1826619 2058465 | 2058465 2280372 | 2280372 2491263 | 2491263

My question is, what is the connections of the abspos in CNA_results.txt with the abspos in raw_results_gene_by_cell.txt?

Best, Langyu

gaobio commented 2 years ago

pret the two output files of copyk

Sorry for the confusion. This is related to the history of our lab's method in calculating genomic copy numbers. The raw output is the copy numbers directly calculated from gene expression matrix, with which one can look at the copy numbers of individual genes that are used in the calculation. New version included a heatmap of this matrix with gene labels. The second output is the results with genomic coordinates instead of using gene names as coordinates. Additional steps of re-adjusting baselines and re-averaging using genomic bins are performed. Genomics bins were previously developed and used in our genomic DNA copy number calculations in many of our publications.