grisslab / scAnnotatR

Other
15 stars 2 forks source link

line search fails when training child model #3

Closed reberya closed 2 years ago

reberya commented 2 years ago

Hello,

Just recently discovered your method and really excited to get it up and running on my dataset. Unfortunately I've run into an issue when training a child classifier.

set.seed(1)

# get cluster markers    
classifier_markers <- markers_filt %>% dplyr::filter(cluster == c) %>% dplyr::select(gene) %>% unlist()

# train classifier
classifier <- train_classifier(train_obj = train_set, 
                                   marker_genes = classifier_markers, cell_type = c, 
                                   assay = 'RNA', tag_slot = 'SPECIFICCELLTYPE', parent_classifier = classifier_parent)

# test classifier
    classifier_test <- test_classifier(classifier = classifier, test_obj = test_set,  
                                       parent_classifier = classifier_parent,
                                       assay = 'RNA', tag_slot = 'SPECIFICCELLTYPE');classifier_test

# save
save_new_model(new_model = classifier, path_to_models = path, include.default = FALSE) 

The output/error is as follows:

Apply pretrained model for parent cell type.

Warning: Some annotated bridge_2 are negative to all_bridge classifier. They are removed from training/testing for bridge_2 classifier.

line search fails -1.199231 0.006236564 1.423887e-05 7.971709e-07 -1.633422e-08 -3.200175e-09 -2.35132e-13Warning in method$predict(modelFit = modelFit, newdata = newdata, submodels = param) :
  kernlab class prediction calculations failed; returning NAs
Warning in method$prob(modelFit = modelFit, newdata = newdata, submodels = param) :
  kernlab class probability calculations failed; returning NAs
Warning in data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
line search fails -1.222922 -0.0123604 1.783248e-05 1.249336e-06 -2.140749e-08 -5.311315e-09 -3.883842e-13Warning in method$predict(modelFit = modelFit, newdata = newdata, submodels = param) :
  kernlab class prediction calculations failed; returning NAs
Warning in method$prob(modelFit = modelFit, newdata = newdata, submodels = param) :
  kernlab class probability calculations failed; returning NAs
Warning in data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
line search fails -1.272578 -0.04216695 1.921311e-05 1.251463e-06 -2.237445e-08 -5.473584e-09 -4.367328e-13Apply pretrained model for parent cell type.

Warning: Some annotated bridge_2 are negative to all_bridge classifier. They are removed from training/testing for bridge_2 classifier.

Warning in method$prob(modelFit = modelFit, newdata = newdata, submodels = param) :
  kernlab class probability calculations failed; returning NAs
Error in `dplyr::mutate()`:
! Problem while computing `class = apply(...)`.
Caused by error in `if (x[1] >= thres) ...`:
! missing value where TRUE/FALSE needed
Backtrace:
  1. scAnnotatR::test_classifier(...)
 12. base::apply(...)
 13. scAnnotatR FUN(newX[, i], ...)

This error occurs during the test_classifier step. And only appears to happen during this one instance of my child class (I have seveal other child classes that work just fine). I've tried simply skipping testing this model, however when I go to classify with my full model the following error occurs:

seurat_obj <- classify_cells(classify_obj = seurat_obj, 
                             assay = 'RNA', slot = 'counts',
                             cell_types = 'all', 
                             path_to_models = path)

Error in marker_genes(x) :
trying to get slot "marker_genes" from an object of a basic class ("NULL") with no slots

8. | marker_genes(x)
7. | FUN(X[[i]], ...)
6. | lapply(classifiers, function(x) marker_genes(x))
5.  | unlist(lapply(classifiers, function(x) marker_genes(x)))
4. | unname(unlist(lapply(classifiers, function(x) marker_genes(x))))
3.  | unique(unname(unlist(lapply(classifiers, function(x) marker_genes(x)))))
2. | classify_cells_seurat(classify_obj, classifiers, cell_types,
chunk_size, path_to_models, ignore_ambiguous_result, cluster_slot,
assay, slot)
1. | classify_cells(classify_obj = nbl, assay = "RNA", slot = "counts",
cell_types = "all", path_to_models = path)

However I am able to get the marker genes for all classifiers using:

classifiers_all <- load_models(path_to_models = path)
marker_genes(classifiers_all[['bridge_2']])

> [1] "CASZ1"         "KAZN"          "TMEM51"        "GRIK3"         "AGBL4"         "LPHN2"         "KCND3"         "MAGI3"         "RP11-267N12.1"

Any thoughts would be appreciated! Thanks, Ryan

nttvy commented 2 years ago

Hi Ryan @reberya

Thank you for having tried our tool. It seems that this issue is from kernlab (our dependency) and it started to happen a while ago. The issue may be because of a weak trained model. (You may notice many warnings when you trained this model.) This sometimes happens to us as well. Therefore, we suggest you change your set of features (for example: by adding more features).

Hope this helps. Bests, Vy

reberya commented 2 years ago

Hi Vy,

Thanks for the quick reply. I've been playing around with this since your suggestion and the number of features I've been using has gone from 20 --> 300 and I'm still getting this error. How many features do you typically use in your child models? Does this differ from your parent models?

Best, Ryan

nttvy commented 2 years ago

Hi Ryan,

The number of features seems not to affect too much this issue. For our models, we normally use about 30-40 features. They can contain overlaps with parent model features.

How many cells belong to the population that you are trying to train the classifier on?

If it's possible, you can also send us a part of your dataset so that we can help you figure out the problem.

Best, Vy

reberya commented 2 years ago

Happy to send the dataset.

The number of cells per population ranges from 292-9599 over 14 groups with ~51k cells in total. The child training classifier sucessfully runs on populations with 1500 and 3500 cells but is failing on the population of 9599 cells.

nttvy commented 2 years ago

Hi Ryan,

Do you mean that you successfully trained (and tested) the classifier for smaller datasets? Was there any warning raised during the training process?

About sharing the dataset, can you upload the dataset somewhere? You can also subset it to a smaller dataset containing only the parent and child population. You can contact me (for some other details on sharing the dataset) via my email.

Bests, Vy

reberya commented 2 years ago

Yes, for example I am able to train a child classifier on a population of 1446 cells but am running into the aforementioned error training a separate child classifier of 3503 cells.

I will play around with implementing this a little more and if I cannot get it by the end of the day I will email it to you. Thank you very much for your help!

nttvy commented 2 years ago

Hi Ryan @reberya

After having had your data, we were trying to figure out the problems...

At first, we solved the issue of:

Error in marker_genes(x) : trying to get slot "marker_genes" from an object of a basic class ("NULL") with no slots

  1. | marker_genes(x)

This was a bug in our package. The bug has been fixed. You can update the package by pulling the code from our master branch.

After having fixed the bug, the issue of:

Warning in method$prob(modelFit = modelFit, newdata = newdata, submodels = param) : kernlab class probability calculations failed; returning NAs Error in dplyr::mutate(): ! Problem while computing class = apply(...). Caused by error in if (x[1] >= thres) ...: ! missing value where TRUE/FALSE needed Backtrace:

  1. scAnnotatR::test_classifier(...)
    1. base::apply(...)
    2. scAnnotatR FUN(newX[, i], ...)

also disappeared. You may want to try it again on your own. If the problem still occurs, please contact us for further investigation.

Again, thanks a lot for your contribution! Bests, Vy

reberya commented 2 years ago

This fixed the error. thank you!