navinlabcode / copykat

Other
203 stars 54 forks source link

Runing multiple times using same data but generate different results #76

Open MaHaoran627 opened 1 year ago

MaHaoran627 commented 1 year ago

Hi,

Thanks for developing this tool! It is very cool and I am using it to call CNVs from some single-cell gastric cancer samples. However, when I ran CopyKat multiple times using the same input and codes (I used T-cells from the same sample as reference normal cells), it generated different results for predicting aneuploids.


-The version of CopyKat: V1.0.8

-Code: copykat.test <- copykat(rawmat=exp.rawdata, id.type="S", ngene.chr=5, win.size=25, KS.cut=0.1, sam.name="sample_use_t_as_ref", distance="euclidean", norm.cell.names=norm.cells, plot.genes="FALSE",n.cores=32)

pred.test <- data.frame(copykat.test$prediction)

pred.test <- pred.test[-which(pred.test$copykat.pred=="not.defined"),]

two_patient@meta.data$copykat.pred <- pred.test$copykat.pred

prop.table(table(two_patient$copykat.pred,two_patient$celltype),margin =2)

-Results: [1] "running copykat v1.0.8 updated 02/25/2022 introduced mm10 module, fixed typos" [1] "step1: read and filter data ..." [1] "30535 genes, 7074 cells in raw data" [1] "10084 genes past LOW.DR filtering" [1] "step 2: annotations gene coordinates ..." [1] "start annotation ..." [1] "step 3: smoothing data with dlm ..." [1] "step 4: measuring baselines ..." [1] "671 known normal cells found in dataset" [1] "run with known normal..." [1] "baseline is from known input" [1] "step 5: segmentation..." [1] "step 6: convert to genomic bins..." [1] "step 7: adjust baseline ..." [1] "step 8: final prediction ..." [1] "step 9: saving results..." [1] "step 10: ploting heatmap ..." Time difference of 29.53742 mins

# first-time result prop.table(table(two_patient$copykat.pred,two_patient$celltype),margin =2)

prediction Int Enterocyte Gastric Pit Gastric Isthmus Int TAC Gastric LYZ Int stem cells Gastric stem cell Int Goblet NK T-cells
aneuploid 0.79970015 0.90347490 0.82057416 0.73280943 0.71276596 0.40990099 0.78947368 0.75229358 0.03278689
diploid 0.20029985 0.09652510 0.17942584 0.26719057 0.28723404 0.59009901 0.21052632 0.24770642 0.96721311

#re-run copykat and get this second-time result prop.table(table(two_patient$copykat.pred,two_patient$celltype),margin =2)

prediction Int Enterocyte Gastric Pit Gastric Isthmus Int TAC Gastric LYZ Int stem cells Gastric stem cell Int Goblet NK T-cells
aneuploid 0.75262369 0.86615187 0.73205742 0.67485265 0.73404255 0.43960396 0.81578947 0.78899083 0.03278689
diploid 0.24737631 0.13384813 0.26794258 0.32514735 0.26595745 0.56039604 0.18421053 0.21100917 0.96721311

Can see that the results for two runs are not consistent in the prediction of diploids and aneuploids. The percentage of T-cells did not change because they were set as reference normal cells.

Is it normal or do I have some wrong settings? Does CopyKat use some random seeds? Could you please help me with this? Thanks so much!

YingboHuang commented 1 year ago

Same here. Is this the problem from parallel computing? I set core =1 and core =24 gives me totally different results.