Thanks for developing this tool! It is very cool and I am using it to call CNVs from some single-cell gastric cancer samples. However, when I ran CopyKat multiple times using the same input and codes (I used T-cells from the same sample as reference normal cells), it generated different results for predicting aneuploids.
-Results:
[1] "running copykat v1.0.8 updated 02/25/2022 introduced mm10 module, fixed typos"
[1] "step1: read and filter data ..."
[1] "30535 genes, 7074 cells in raw data"
[1] "10084 genes past LOW.DR filtering"
[1] "step 2: annotations gene coordinates ..."
[1] "start annotation ..."
[1] "step 3: smoothing data with dlm ..."
[1] "step 4: measuring baselines ..."
[1] "671 known normal cells found in dataset"
[1] "run with known normal..."
[1] "baseline is from known input"
[1] "step 5: segmentation..."
[1] "step 6: convert to genomic bins..."
[1] "step 7: adjust baseline ..."
[1] "step 8: final prediction ..."
[1] "step 9: saving results..."
[1] "step 10: ploting heatmap ..."
Time difference of 29.53742 mins
# first-time result
prop.table(table(two_patient$copykat.pred,two_patient$celltype),margin =2)
prediction
Int Enterocyte
Gastric Pit
Gastric Isthmus
Int TAC
Gastric LYZ
Int stem cells
Gastric stem cell
Int Goblet
NK T-cells
aneuploid
0.79970015
0.90347490
0.82057416
0.73280943
0.71276596
0.40990099
0.78947368
0.75229358
0.03278689
diploid
0.20029985
0.09652510
0.17942584
0.26719057
0.28723404
0.59009901
0.21052632
0.24770642
0.96721311
#re-run copykat and get this second-time result
prop.table(table(two_patient$copykat.pred,two_patient$celltype),margin =2)
prediction
Int Enterocyte
Gastric Pit
Gastric Isthmus
Int TAC
Gastric LYZ
Int stem cells
Gastric stem cell
Int Goblet
NK T-cells
aneuploid
0.75262369
0.86615187
0.73205742
0.67485265
0.73404255
0.43960396
0.81578947
0.78899083
0.03278689
diploid
0.24737631
0.13384813
0.26794258
0.32514735
0.26595745
0.56039604
0.18421053
0.21100917
0.96721311
Can see that the results for two runs are not consistent in the prediction of diploids and aneuploids. The percentage of T-cells did not change because they were set as reference normal cells.
Is it normal or do I have some wrong settings? Does CopyKat use some random seeds? Could you please help me with this? Thanks so much!
Hi,
Thanks for developing this tool! It is very cool and I am using it to call CNVs from some single-cell gastric cancer samples. However, when I ran CopyKat multiple times using the same input and codes (I used T-cells from the same sample as reference normal cells), it generated different results for predicting aneuploids.
-The version of CopyKat: V1.0.8
-Code: copykat.test <- copykat(rawmat=exp.rawdata, id.type="S", ngene.chr=5, win.size=25, KS.cut=0.1, sam.name="sample_use_t_as_ref", distance="euclidean", norm.cell.names=norm.cells, plot.genes="FALSE",n.cores=32)
pred.test <- data.frame(copykat.test$prediction)
pred.test <- pred.test[-which(pred.test$copykat.pred=="not.defined"),]
two_patient@meta.data$copykat.pred <- pred.test$copykat.pred
prop.table(table(two_patient$copykat.pred,two_patient$celltype),margin =2)
-Results: [1] "running copykat v1.0.8 updated 02/25/2022 introduced mm10 module, fixed typos" [1] "step1: read and filter data ..." [1] "30535 genes, 7074 cells in raw data" [1] "10084 genes past LOW.DR filtering" [1] "step 2: annotations gene coordinates ..." [1] "start annotation ..." [1] "step 3: smoothing data with dlm ..." [1] "step 4: measuring baselines ..." [1] "671 known normal cells found in dataset" [1] "run with known normal..." [1] "baseline is from known input" [1] "step 5: segmentation..." [1] "step 6: convert to genomic bins..." [1] "step 7: adjust baseline ..." [1] "step 8: final prediction ..." [1] "step 9: saving results..." [1] "step 10: ploting heatmap ..." Time difference of 29.53742 mins
# first-time result prop.table(table(two_patient$copykat.pred,two_patient$celltype),margin =2)
#re-run copykat and get this second-time result prop.table(table(two_patient$copykat.pred,two_patient$celltype),margin =2)
Can see that the results for two runs are not consistent in the prediction of diploids and aneuploids. The percentage of T-cells did not change because they were set as reference normal cells.
Is it normal or do I have some wrong settings? Does CopyKat use some random seeds? Could you please help me with this? Thanks so much!