Closed adamgayoso closed 1 year ago
I'm actually not sure where it comes back in at the moment, but I can tell you that the counts
layer still exists in the data object.
In [1]: import openproblems as op
In [2]: dat = op.data.immune_cells.load_immune()
In [3]: dat
Out[3]:
AnnData object with n_obs × n_vars = 33506 × 12303
obs: 'batch', 'chemistry', 'data_type', 'dpt_pseudotime', 'final_annotation', 'mt_frac', 'n_counts', 'n_genes', 'sample_ID', 'size_factors', 'species', 'study', 'tissue'
var: 'n_cells'
uns: 'var_names_all', '_from_cache'
layers: 'log_normalized', 'counts'
This is in the latest github release... @danielStrobl @scottgigante-immunai... any ideas why this is still there? It definitely should be...
.X
is copied to .layers["counts"]
by default in all datasets in the dataset decorator.
Indeed I found this:
Though it should probably have a copy added to it explicitly incase any normalization methods operate in place
It's an interesting thought. I'd rather not double the memory/disk/network usage here for nothing -- what if instead we add a test to the normalization test suite ensuring that normalizations don't operate in place?
Thanks @scottgigante-immunai. I recall something was done in the decorator... but I couldn't quickly find what.
It's still quite strange to find seemingly contradictory lines of code like this though... deleting the counts layer explicitly, but still relying that it's there.
There's no real need to delete it here -- it's only repopulated if it's missing. I'll remove that line for the sake of less confusion.
Also the cpm normalization adds a counts layer as a copy. Ideally you wouldn't need to look in 5 places to find what's going on :)
Ah, that's redundant and can be removed too :) Thanks!
Take the immune dataset:
Here counts are moved to .X and the layer is deleted:
https://github.com/openproblems-bio/openproblems/blob/2d7198652875e48c63b6b6d6ab440eec27bafda0/openproblems/data/immune_cells.py#L39-L41 Here .X is made to be the log normalized expression:
https://github.com/openproblems-bio/openproblems/blob/2d7198652875e48c63b6b6d6ab440eec27bafda0/openproblems/tasks/_batch_integration/batch_integration_graph/datasets/immune.py#L25
But I'm confused because in the scib scvi wrapper it's using the counts layer, which shouldn't exist according to this code.
https://github.com/theislab/scib/blob/77ab015254754baa5ca3380bd592bcc9207241de/scib/integration.py#L212