Closed farnoush-shh closed 3 years ago
Hi, it seems that you are using scANVI with a query dataset that has new cell types in it, that the reference dataset doesnt have. Therefore you have to preprocess the query dataset in the following way before you call 'load_query_data()':
query_adata.obs['orig_cell_types'] = query_adata.obs[cell_type_key].copy()
query_adata.obs[cell_type_key] = old_scanvi.unlabeled_category_
model = sca.models.SCANVI.load_query_data(
query_adata,
ref_path,
freeze_dropout = True,
)
print("Labelled Indices: ", len(model._labeled_indices))
print("Unlabelled Indices: ", len(model._unlabeled_indices))
as mentioned in this notebook: https://scarches.readthedocs.io/en/latest/scanvi_surgery_pipeline.html
Thanks Marco. I did it before but results were not satisfying (error was gone anyway)...I will try again
again back, this time I am using TRVAE,
trvae = sca.models.TRVAE( adata=reference_adata, condition_key=condition_key, conditions=reference_batch_labels, hidden_layer_sizes=[128,128], )
INITIALIZING NEW NETWORK.............. Encoder Architecture: Input Layer in, out and cond: 4102 128 18 Hidden Layer 1 in/out: 128 128 Mean/Var Layer in/out: 128 10 Decoder Architecture: First Layer in, out and cond: 10 128 18 Hidden Layer 1 in/out: 128 128 Output Layer in/out: 128 4102
and the error which is arising for training,
.obs
of view, copying.
Trying to set attribute .obs
of view, copying.ValueError Traceback (most recent call last)
Hi, that one was a bit trickier. It was a bug with the torch.split() function when the number of batches is bigger than the latent dim. Good that you detected the bug! I hopefully fixed the bug and updated to the new version 0.3.3. So please also update your package installation and tell me if it works now.
Best,
Hi Marco,
great! now it is working..another issue that I have from the previous version (older than 0.3.0)..it is working very well until the number of labels (classes) are less than 20 but when I am using for more classes (almost 200 cell_types) , it will fail...dimension of latent space maybe?
training worked and now:
TypeError Traceback (most recent call last)
Yeah possible options would be to also increase latent dim, general architecture size, or even increase number of highly variable genes if possible.
Concerning your second error did you call remove_sparsity() function for adata before using it for trvae?
from scarches.dataset.trvae.data_handling import remove_sparsity
adata = remove_sparsity(adata)
Yes, it solved the problem..seems I forgot to run that line...many thanks.
kho ie star bezan :)
kho ie star bezan :)
:) taze az dandoon pezeshki umadam ta hamin ja ham eftekhar amiz amal kardam :)
Hi again,
Maybe there is something that I am missing! no error but strange results.. Back to SCANVI model: I am trying to predict 77 labels, and predicted labels are 7 labels
Screenshot 2021-01-19 at 15 41 29
I checked the whole process and restricted my labels to 8 and it worked very well as I expected..I thought there must be some fixed parameters which yield to this result.
I will be so grateful for your help!
Update: Increasing the latent space dimension also did not work!
Hi, if you still have problems here, could you provide a print of your model architecture byprint(scanvi.model)
? Additionally how many genes are you using for this experiment?
Yes, I still have this problem and will update you.
Hi,
Anndata setup with scvi-tools version 0.8.1.
Data Summary
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Data ┃ Count ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ Cells │ 95279 │
│ Vars │ 28713 │
│ Labels │ 105 │
│ Batches │ 18 │
│ Proteins │ 0 │
│ Extra Categorical Covariates │ 0 │
│ Extra Continuous Covariates │ 0 │
└──────────────────────────────┴───────┘
SCVI Data Registry
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Data ┃ scvi-tools Location ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ X │ adata.X │
│ batch_indices │ adata.obs['_scvi_batch'] │
│ local_l_mean │ adata.obs['_scvi_local_l_mean'] │
│ local_l_var │ adata.obs['_scvi_local_l_var'] │
│ labels │ adata.obs['_scvi_labels'] │
└───────────────┴─────────────────────────────────┘
and the SCANVI.model:
SCANVAE( (z_encoder): Encoder( (encoder): FCLayers( (fc_layers): Sequential( (Layer 0): Sequential( (0): Linear(in_features=28731, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) (Layer 1): Sequential( (0): Linear(in_features=128, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) (Layer 2): Sequential( (0): Linear(in_features=128, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) (Layer 3): Sequential( (0): Linear(in_features=128, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) ) ) (mean_encoder): Linear(in_features=128, out_features=15, bias=True) (var_encoder): Linear(in_features=128, out_features=15, bias=True) ) (l_encoder): Encoder( (encoder): FCLayers( (fc_layers): Sequential( (Layer 0): Sequential( (0): Linear(in_features=28731, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) ) ) (mean_encoder): Linear(in_features=128, out_features=1, bias=True) (var_encoder): Linear(in_features=128, out_features=1, bias=True) ) (decoder): DecoderSCVI( (px_decoder): FCLayers( (fc_layers): Sequential( (Layer 0): Sequential( (0): Linear(in_features=33, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): None ) (Layer 1): Sequential( (0): Linear(in_features=128, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): None ) (Layer 2): Sequential( (0): Linear(in_features=128, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): None ) (Layer 3): Sequential( (0): Linear(in_features=128, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): None ) ) ) (px_scale_decoder): Sequential( (0): Linear(in_features=128, out_features=28713, bias=True) (1): Softmax(dim=-1) ) (px_r_decoder): Linear(in_features=128, out_features=28713, bias=True) (px_dropout_decoder): Linear(in_features=128, out_features=28713, bias=True) ) (classifier): Classifier( (classifier): Sequential( (0): FCLayers( (fc_layers): Sequential( (Layer 0): Sequential( (0): Linear(in_features=15, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) (Layer 1): Sequential( (0): Linear(in_features=128, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) (Layer 2): Sequential( (0): Linear(in_features=128, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) (Layer 3): Sequential( (0): Linear(in_features=128, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) ) ) (1): Linear(in_features=128, out_features=105, bias=True) (2): Softmax(dim=-1) ) ) (encoder_z2_z1): Encoder( (encoder): FCLayers( (fc_layers): Sequential( (Layer 0): Sequential( (0): Linear(in_features=120, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) (Layer 1): Sequential( (0): Linear(in_features=233, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) (Layer 2): Sequential( (0): Linear(in_features=233, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) (Layer 3): Sequential( (0): Linear(in_features=233, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): Dropout(p=0.1, inplace=False) ) ) ) (mean_encoder): Linear(in_features=128, out_features=15, bias=True) (var_encoder): Linear(in_features=128, out_features=15, bias=True) ) (decoder_z1_z2): Decoder( (decoder): FCLayers( (fc_layers): Sequential( (Layer 0): Sequential( (0): Linear(in_features=120, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): None ) (Layer 1): Sequential( (0): Linear(in_features=233, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): None ) (Layer 2): Sequential( (0): Linear(in_features=233, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): None ) (Layer 3): Sequential( (0): Linear(in_features=233, out_features=128, bias=True) (1): None (2): LayerNorm((128,), eps=1e-05, elementwise_affine=False) (3): ReLU() (4): None ) ) ) (mean_decoder): Linear(in_features=128, out_features=15, bias=True) (var_decoder): Linear(in_features=128, out_features=15, bias=True) ) )
just could predict 2 labels among 105:
reference_latent.obs.predictions.unique() array(['PC1', 'vCM1.0'], dtype=object).
Ps: what is your strategy for imbalance datasets?
Many thanks.
Okay, first of all I would strongly suggest that you preprocess your data by filtering highly variable genes, as described in this notebook: https://scarches.readthedocs.io/en/latest/reference_building_from_scratch.html
You can test with 2000 and with 4000 genes.
It seems that you added another layer in the network, I would suggest to firstly use the 2 hidden layers, as we are proposing it as default. And just make the latent representation dimension higher. So for now maybe first try it with the preprocessed dataset and standard architecture. If that doesnt work expand the latent dim to 20 or 30. If that doesnt work additionally add 3rd hidden layer.
Thanks for your suggestions. I used highly variable genes before with 4107 genes and 45000 cells and tried all your suggestions but instead of 2 labels I could produce 10 or 12 Labels. It is working well when I do subclustering and having labels up to 12 but this is not our desire. Anyway, I will try again and in case I have problems, will back for discussion. But still one question remained about imbalance dataset? do you have such option like "Focal loss" ? in case of having imbalance dataset, the accuracy is not a good option to evaluate the performance I guess. and many thank for your quick response.
Since this question goes really into detail of the scANVI base functionality and behavior and does not necessarily have to do with architecture surgery I would send you to the creators of scANVI. Maybe you should post your question in their issues section: https://github.com/YosefLab/scvi-tools
Thanks Marco. I will write them.
Hi,
I am using scArches to project and integrate query datasets on the top of a reference, it is working well until training on reference dataset but training on query dataset gives me a Runtime error...
model = sca.models.SCANVI.load_query_data( query_adata, ref_path, freeze_dropout = True, ) model._unlabeled_indices = np.arange(query_adata.n_obs) model._labeled_indices = [] print("Labelled Indices: ", len(model._labeled_indices)) print("Unlabelled Indices: ", len(model._unlabeled_indices))
INFO Using data from adata.X
INFO Computing library size prior per batch
INFO Registered keys:['X', 'batch_indices', 'local_l_mean', 'local_l_var', 'labels']
INFO Successfully registered anndata object containing 1099 cells, 4102 vars, 28 batches, 88 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0
extra continuous covariates.
RuntimeError Traceback (most recent call last)