Runing multiple times using same data but generate different results

Hi,

Thanks for developing this tool! It is very cool and I am using it to call CNVs from some single-cell gastric cancer samples. However, when I ran CopyKat multiple times using the same input and codes (I used T-cells from the same sample as reference normal cells), it generated different results for predicting aneuploids.

-The version of CopyKat: V1.0.8

-Code: copykat.test <- copykat(rawmat=exp.rawdata, id.type="S", ngene.chr=5, win.size=25, KS.cut=0.1, sam.name="sample_use_t_as_ref", distance="euclidean", norm.cell.names=norm.cells, plot.genes="FALSE",n.cores=32)

pred.test <- data.frame(copykat.test$prediction)

pred.test <- pred.test[-which(pred.test$copykat.pred=="not.defined"),]

two_patient@meta.data$copykat.pred <- pred.test$copykat.pred

prop.table(table(two_patient$copykat.pred,two_patient$celltype),margin =2)

-Results: [1] "running copykat v1.0.8 updated 02/25/2022 introduced mm10 module, fixed typos" [1] "step1: read and filter data ..." [1] "30535 genes, 7074 cells in raw data" [1] "10084 genes past LOW.DR filtering" [1] "step 2: annotations gene coordinates ..." [1] "start annotation ..." [1] "step 3: smoothing data with dlm ..." [1] "step 4: measuring baselines ..." [1] "671 known normal cells found in dataset" [1] "run with known normal..." [1] "baseline is from known input" [1] "step 5: segmentation..." [1] "step 6: convert to genomic bins..." [1] "step 7: adjust baseline ..." [1] "step 8: final prediction ..." [1] "step 9: saving results..." [1] "step 10: ploting heatmap ..." Time difference of 29.53742 mins

# first-time result prop.table(table(two_patient$copykat.pred,two_patient$celltype),margin =2)

prediction	Int Enterocyte	Gastric Pit	Gastric Isthmus	Int TAC	Gastric LYZ	Int stem cells	Gastric stem cell	Int Goblet	NK T-cells
aneuploid	0.79970015	0.90347490	0.82057416	0.73280943	0.71276596	0.40990099	0.78947368	0.75229358	0.03278689
diploid	0.20029985	0.09652510	0.17942584	0.26719057	0.28723404	0.59009901	0.21052632	0.24770642	0.96721311

#re-run copykat and get this second-time result prop.table(table(two_patient$copykat.pred,two_patient$celltype),margin =2)

prediction	Int Enterocyte	Gastric Pit	Gastric Isthmus	Int TAC	Gastric LYZ	Int stem cells	Gastric stem cell	Int Goblet	NK T-cells
aneuploid	0.75262369	0.86615187	0.73205742	0.67485265	0.73404255	0.43960396	0.81578947	0.78899083	0.03278689
diploid	0.24737631	0.13384813	0.26794258	0.32514735	0.26595745	0.56039604	0.18421053	0.21100917	0.96721311

Can see that the results for two runs are not consistent in the prediction of diploids and aneuploids. The percentage of T-cells did not change because they were set as reference normal cells.

Is it normal or do I have some wrong settings? Does CopyKat use some random seeds? Could you please help me with this? Thanks so much!

navinlabcode / copykat

Runing multiple times using same data but generate different results #76