powellgenomicslab / scPred

scPred package for cell type prediction from scRNA-seq data
MIT License
71 stars 16 forks source link

Non-unique values when setting 'row.names' #17

Open a-solovyev12 opened 3 years ago

a-solovyev12 commented 3 years ago

Hello @joseah,

When training a model, I have encountered the following error:

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
Calls: print ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-

This comes after the training step has finished. However, I've double-checked and the row names (cell ids) are definitely not duplicated. Could you please have a look at that problem? Thanks a lot!

Best regards, Andrey

joseah commented 3 years ago

Hi Andrey,

this issue is probably related to the cross validation step. Are there cell types with a few cells? If that's the case, some the the folds may be empty when splitting the data for model assessment. A workaround could be reducing the number of resamples via the number parameter.

Cheers.

ncalistri commented 2 years ago

I'm seeing a similar error, and went through the 'get probabilities' function step by step to find that the probabilities table for one of the classes is 1. Full of NAs and 2. Has duplicate barcodes:

Right before probs <- Reduce(function(x, y) merge(x, y, by = "barcode"), probs) is used these are the dimensions of my list of probability tables:

image

And looking at that singular entry with extra entries:

image

What's interesting is that the overall summary table with ROC/Specificity/Sensitivity reports that the model performs well for that class:

image

ncalistri commented 2 years ago

I solved this issue by using a different model (mda) instead of svmRadial, and the probability table no longer had multiple entries from the same barcode and I was able to retrieve the probability table as expected.