Open canergen opened 3 months ago
Is there some documentation on what is expected of the custom dataloader's collate function? I can imagine a dict with keys like X
, batch
and labels
just by following up on the different types of exceptions I am getting. But for poor souls like us who are not familiar with the codebase, it'd be amazing to have some documentation of what type of keys a collate function should return in the dictionary to work.
Hi, we are currently still exchanging ideas with lamin and CZI to make the implementation better (and hopefully work towards support throughout all models - currently scVI works). Overall, the final requirement will be that a registry as a dictionary is created similar to https://colab.research.google.com/drive/10sXec_TicMKtLA6hMcgfkado-FgoNKxw#scrollTo=e8vZgceklGdH. We use as a discussion channel https://github.com/laminlabs/lamindb/issues/1826 to work together on a better implementation. Happy to connect offline (best case scverse Zulip) to see how we can support your work.
CustomDataloaders currently don't support advanced capabilities like scArches or celltype prediction in scANVI. We have to create a registry without setup_anndata that contains the same elements (see below). https://github.com/chanzuckerberg/cellxgene-census/blob/222efddf2ce82f93f76329aa353962c1dc2400ac/api/python/notebooks/experimental/pytorch_loader_scvi.ipynb is the first working example. Currently, they use the following code to save the model:
We want to create a new function that fills out the registry and passes it to the model at:
model = scvi.model.SCVI(n_layers=n_layers, n_latent=n_latent, gene_likelihood="nb", encode_covariates=False)
. You can see all necessary entries and the structure at:scvi.adata_manager.get_state_registry(scvi.REGISTRY_KEYS.X_KEY).to_dict()
. After fixing this, all uses of_module_init_on_train
throughout the codebase should be removed as they are not necessary anymore.