Issues with classification

RolantusdataExp commented 2 years ago

HI developers Thanks for the packages! Really hope this might become my new annotation tool

I have been testing your pre-trained models, and they seem to fail the prediction of cell populations. I have already annotated the dataset, thus I know I am having certain cell types, which are included in the classifiers.

I have checked if the applied marker_genes are actual markers in my dataset

Can you inform me if the Seurat object to predict needs any certain slot/information for the algorithm to work?

A small extra thing is that it seems that your Default classifiers have the marker_genes written in capslock (for instance CD81 and not Cd81), which I assume causes a problem (I have tried changing the nomenclature, but did fix the problem

I am using the newest R version

Best regards, Peter

nttvy commented 2 years ago

Thank you @RolantusdataExp for having tried scAnnotatR.

The capslock should not cause the problem, because all input strings will automatically be converted into the same form before further processing.

Which cell types are you testing? Is your dataset from humans? If it is not, that might cause the problem because human cell classification might differ from other species' cell classification.

Sequencing data from different platforms and protocols can also cause varied classification results.

Anyway, we are just guessing... If it's possible, may you share with us your dataset? In that case, we can have a more detailed debug.

Bests, Vy

RolantusdataExp commented 2 years ago

Hi Vy, Thanks for the very rapid reply! 😄

So the dataset is based on Murine data collected using the 10X platform - however, the markers used for B-cells highly correlate with my data. The question might be that it misclassifies because some of the markers are not present?

Would love to share my data, but it is still unpublished

Best regards, Peter

nttvy commented 2 years ago

Hi Peter, @RolantusdataExp

The absence of markers can also affect the classification. How many markers of the classifier were found in your dataset? Did you try only B cell classifier or all available classifiers? How was the AUC for B cell classifier?

Just for you to know... In most of our benchmarks, the difference in sequencing platforms mostly causes classification variance. For murine data, this of course should be considered as well. But so far, we have not tested our classifiers on murine cells so we cannot have any further conclusion.

Bests, Vy

grisslab / scAnnotatR

Issues with classification #2