Open nh3 opened 5 years ago
Sorry, realised that global attributes can not be dataset but at most multidimensional datatype, making them not so useful in this context. Therefore I am closing this issue.
Hi, I recently implemented a converter between anndata and loom (https://github.com/ebi-gene-expression-group/scanpy-scripts/pull/37) with the intention of further interoperability. It allows a relatively complete transfer back and forth of .uns in addition to .obsm and .varm, by automatically generating and storing a manifest in loom. There're some hard coded parts (e.g. special handling of .uns['neighbors']) that's apparently not idea, but if you think the general approach is acceptable, I am happy to make an improved version for a PR.
@sophietr @mbuttner This might be an intermediate solution until cellxgene grows in functionality...
Thanks! We're happy to have an improved loom export and import if it goes along with looms canonical functionality. If we are at risk of "doing strange things to loom files", then we'd better not do it.
Is "generating a manifest in loom" something that loom foresees? If yes, then happy to go over a PR. :)
A proposed feature for loom v3 is the /global
group where datasets of unrestricted shape can go according to https://github.com/linnarsson-lab/loompy/issues/51. There isn't specific mention of a manifest table under /global
, but it is compatible.
As loom v3 is not yet implemented/announced, what I do is setting LOOM_SPEC_VERSION
to a special value ('3.0.0alpha' in this case) when writing, and when reading if version doesn't match this value then revert to what sc.read_loom() does. Actually, all the extra bits go under /global
when exporting and the generated loom passes loompy v2's validation, so other loom reader should read it without problem (just that they can't read the extra bits). Do you think this is acceptable?
@nh3 That sounds reasonable!
@slinnarsson With this we're finally addressing one of the initial questions that I had about loom (unstructured, global annotation). Are you fine with @nh3 supporting this within anndata as laid out above? Thanks for briefly taking the time of approving!
Hi
Sure, sounds good. To be clear, this is essentially option 2 from https://github.com/linnarsson-lab/loompy/issues/51 ?
Hi @slinnarsson and @falexwolf,
Yes, this is essentially option 2, with a mandatory /global/manifest
to store the path and data type for the stored datasets. The minimum structure would look like this:
/.attrs['LOOM_SPEC_VERSION'] = '3.0.0alpha'
/global
/global/manifest
/matrix
/layers
/col_attrs
/col_graphs
/row_attrs
/row_graphs
where /global/manifest
is a table with at least two columns: loom_path
, type
. More columns can be added to indicate where the dataset should go in the object supported for conversion. Currently, it aims to support AnnData
and SingleCellExperiment
, so would have additional columns called anndata_path
and sce_path
. Here are some examples:
/global/reducedDim__pca array /obsm/X_pca @reducedDims$PCA # A row for PCA embeddings
/global/pca__variance array /uns/pca/variance @metadata$pca$variance # A row for PCA variances
/col_graphs/neighbors__connectivities graph /uns/neighbors/connectivities @colGraphs$neighbors__connectivities # A row for KNN graph
/.attrs[louvain__parameters__random_state] scalar /uns/louvain/params/random_state @metadata$louvain$params$random_state # A row for louvain random seed
This table is generated largely automatically (with some hard-coded special treatment for certain slots) when writing to loom, and the reader function in python or R put the data into specified place fully automatically.
I flatten the path when writing to loom since I wasn't sure from my reading whether or not /global
supports nested groups under it.
Anyway, many details can be agreed on and adjusted later, but this is the general approach.
For the python part, it calls read_loom() write_loom() from scanpy and then uses h5py to do the extra stuff. It lacks data compression and timestamp at the moment but can be implemented with h5py, or, better yet, loompy if there's API for that. For the R part, it calls import() export() from LoomExperiment and then uses rhdf5 for the rest.
Hopefully this isn't duplicating what's already implemented in loompy v3. Please let me know what you think.
Many thanks.
Yes, let's please make sure we don't duplicate loompy code within anndata. Within anndata, we should simply use some top-level functions that pass AnnData's fields into the appropriate loompy funtions.
I wonder what the path forward here is:
If I interpret the other issues correctly, loompy 3.x nowadays has the necessary functionality to make options 2 and 3 feasible.
Hi Alex,
I am interested in using loom as an exchange format. Similar to #111,
uns
is not carried over, which limits its use. I am aware of https://github.com/linnarsson-lab/loompy/issues/1. As loom 2.0 supports having datasets as global attributes, would it be possible now forwrite_loom()
to writeuns
to loom? One possible way of doing it might be putting names and types (scalar, dataset) of the items in a dataset, say/uns_manifest
, and write/{uns_item_name}
for each of the item. Forobsm
andvarm
, one might do the same, or put only metadata such as names, types and dimensions into global attributes and insert the actual data into obs/var such asX_pca_1
, ...,X_pca_n
. Then, it should be easy to re-createuns
,obsm
andvarm
in a standard way when reading. I am happy to discuss further and/or make PR forread_loom()
andwrite_loom()
if needed. Please let me know what you think. Thanks!