grisslab / scAnnotatR

Other
15 stars 2 forks source link

slot "marker_genes" #7

Closed antoine4ucsd closed 1 year ago

antoine4ucsd commented 1 year ago

Hello thank you for developing this package I am trying to identify cell types from human brain samples using scAnnotatR. (some samples may be contaminated with blood so might be worth including immune cells) if I ran this:

classify_cells(classify_obj = mydata, 
                             assay = 'RNA', slot = 'counts',
                             cell_types = 'T cells', 
                             path_to_models = 'default')

it works but if I want to retrieve all possible cells I get this error:

classify_cells(classify_obj = mydata, 
                             assay = 'RNA', slot = 'counts',
                             cell_types = 'T cells', 
                             path_to_models = 'default')
Error in marker_genes(x) : 
  trying to get slot "marker_genes" from an object of a basic class ("NULL") with no slots

any suggestions? where can I find the list of available cell types?

thank you!

jgriss commented 1 year ago

Hi @antoine4ucsd ,

Thanks a lot for your interest in our package!

Just to be sure: The code you pasted is identical in the two boxes. Is that correct?

antoine4ucsd commented 1 year ago

Sorry. For the second one, I use cell types “all” and it failed

a

On Jan 10, 2023, at 11:32 AM, Johannes Griss @.***> wrote:



Hi @antoine4ucsdhttps://urldefense.com/v3/__https://github.com/antoine4ucsd__;!!LLK065n_VXAQ!jq5inF9_OSV9oaxuesC8WdyaMqeii-AYF10M0LklUpwE8bMneYFRWI-5QhU4w3QD8BCHDRWd7bDn6fAt1BdawPMtbp4$ ,

Thanks a lot for your interest in our package!

Just to be sure: The code you pasted is identical in the two boxes. Is that correct?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/grisslab/scAnnotatR/issues/7*issuecomment-1377748869__;Iw!!LLK065n_VXAQ!jq5inF9_OSV9oaxuesC8WdyaMqeii-AYF10M0LklUpwE8bMneYFRWI-5QhU4w3QD8BCHDRWd7bDn6fAt1BdaQR8tdCw$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AENFHZ3XNOKRUORSMAJHJRDWRW2MJANCNFSM6AAAAAATXETYTU__;!!LLK065n_VXAQ!jq5inF9_OSV9oaxuesC8WdyaMqeii-AYF10M0LklUpwE8bMneYFRWI-5QhU4w3QD8BCHDRWd7bDn6fAt1BdaJ0objKA$. You are receiving this because you were mentioned.Message ID: @.***>

antoine4ucsd commented 1 year ago

see below

classify_cells(classify_obj = mydata, 
                                       assay = 'RNA', slot = 'counts',
                                       cell_types = 'all', 
                                       path_to_models = 'default')
snapshotDate(): 2022-10-31
loading from cache
Error in marker_genes(x) : 
  trying to get slot "marker_genes" from an object of a basic class ("NULL") with no slots
nttvy commented 1 year ago

@jgriss sorry for interrupting but I think I know the problem @antoine4ucsd I think this issue can be related to your question. In brief, your data may not contain the features/genes that are used by one/some pre-trained classifiers. To verify this, you can first access all our in-built models:

default_models <- load_models("default")
names(default_models)
default_models

and you can also access the marker genes of the models, for example:

marker_genes(default_models[['B cells']])

Then can you check whether any model(s) cannot be used on your data.

Please return to this issue if this doesn't work. Hope this helps. Vy

antoine4ucsd commented 1 year ago

thank you. I will try and get back to you

jgriss commented 1 year ago

Hi @nttvy

Thanks a lot for the fast response!

Maybe we can think about whether we can improve the respective error message?

nttvy commented 1 year ago

Hi @jgriss

I have 3 solutions: 1/ first check the availability of all features used by models called by the user and raise an error, or 2/ also check the availability of the features but only raise a warning and continue applying the models having enough features, or 3/ check the availability of the features, raise a warning, and add zeros to unavailable features.

I can implement one of these solutions in a short period of time.

Vy

jgriss commented 1 year ago

Hi,

Personally, I believe that solution 2 is the best.

1) Possible but not so user friendly

2) User does get the warning but the function still works

3) We don't know what the outcome would be

Does this sound reasonable?

nttvy commented 1 year ago

Hi @jgriss

I agree. I also made changes to the 'develop' branch. Please check this out.

Vy

jgriss commented 1 year ago

Hi @nttvy

Changes look great! Thanks a lot!

I only adapted the version number and error message text. Please check if you're happy with them.

nttvy commented 1 year ago

Hi @jgriss

Everything looks good. Thank you!

Vy

seb951 commented 1 year ago

Thanks for the fix. Cases such as this one will still fail with a cryptic error message (I understand why, but perhaps we could have a more explicit error message )?

seurat.obj <- classify_cells(classify_obj = tirosh_mel80_example, 
                              assay = 'RNA', slot = 'counts',
                              cell_types = c('Plasma cells'), 
                              path_to_models = 'default')

Also, in other cases it feels like a more explicit warning would help, such as: Warning: All genes from Plasma cells classifier model must be present in the dataset to perform classification. Classification of Plasma cells skipped. instead of : Warning: Some genes to classify Plasma cells are not available in the dataset. Classification of Plasma cells skipped. Thank you.

jgriss commented 1 year ago

Hi @seb951

Thanks a lot for the very helpful feedback! Error message is already updated.

We'll also try to catch the other cases