Open beyondpie opened 10 months ago
Good idea! Yes, we can add read_10x_mtx
to the package. Do you want to give it a try?
Sure. I see scanpy
is also our dependence. I can basically call scancp's function, which load data into memory, then create SnapATAC2's AnnData to store it in file. What do you think?
scanpy is not a mandatory dependency. It would be better not to rely on scanpy for this.
A function called "read_10x_mtx" was added with this commit 1a7f69269514fd9ba4b31c293ca68d0999642782 for reading mtx files generated by 10x genomics. I'm closing this issue now.
Thanks, Kai! Sorry for the late response, I am now in the job market...
Hi Kai @kaizhang ,
I would suggest we copy the behavior of getting gene symbols by default in scanpy.read_10x_mtx (https://github.com/scverse/scanpy/blob/214e05bdc54df61c520dc563ab39b7780e6d3358/scanpy/readwrite.py#L570):
genes = pd.read_csv(
path / f"{prefix}{'genes' if is_legacy else 'features'}.tsv{suffix}",
header=None,
sep="\t",
)
if var_names == "gene_symbols":
var_names_idx = pd.Index(genes[1].values)
if make_unique:
var_names_idx = anndata.utils.make_index_unique(var_names_idx)
adata.var_names = var_names_idx
adata.var["gene_ids"] = genes[0].values
elif var_names == "gene_ids":
adata.var_names = genes[0].values
adata.var["gene_symbols"] = genes[1].values
else:
raise ValueError("`var_names` needs to be 'gene_symbols' or 'gene_ids'")
I noticed that in our package, we will get ENSMUSG-like index, but most of the time, I think we need gene symbols.
Thanks! Songpeng
Hi Kai,
I found that the description of read_mtx is not clear.
Maybe we can provide another function like scanpy to directly load mtx from a directory?
Sincerely, Songpeng