Closed HelloWorldLTY closed 6 months ago
Hi, I don’t see any errors in your example. Can you please specify what you expected to happen and how what happened differs from that?
Also please provide code as text (in a Markdown code block), not as images.
Hi, I intend to have 13634 hvgs, but it gave me 13652 genes. I am curious about the difference. Is it a bug?
Code:
adata_atac.X = (adata_atac.X > 0)*1
sc.pp.highly_variable_genes(adata_atac, n_top_genes=13634)
adata_atac = adata_atac[:,adata_atac.var['highly_variable']]
Could have all kinds of reasons. Genes sharing the same score, duplicate gene names.
We won’t be able to tell without a reproducible example.
If it is caused by the same score, how to ensure the uniquenss? Thanks. I did not receive the var.unique.name error thus I think it was not caused by duplicate gene names
It is neurips 2021 single cell competition dataset.
I’ll take a look if you update your issue with a code block that I can copy, that will download the dataset and then reproduce the error without me having to go to any website and download anything manually.
Thanks. The dataset is here:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122
Sicne it is from GSE, I think it is easy to download and it is formed into h5ad format (the multiome part). GSE194122_openproblems_neurips2021_multiome_BMMC_processed.h5ad.gz
Code:
adata_atac.X = (adata_atac.X > 0)*1
sc.pp.highly_variable_genes(adata_atac, n_top_genes=13634)
adata_atac = adata_atac[:,adata_atac.var['highly_variable']]
So this would be a reproducible example:
import gzip
import shutil
from urllib.request import urlopen
from pathlib import Path
from tqdm.notebook import tqdm
import scanpy as sc
url = "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE194122&format=file&file=GSE194122%5Fopenproblems%5Fneurips2021%5Fmultiome%5FBMMC%5Fprocessed%2Eh5ad%2Egz"
path = Path("data/GSE194122_openproblems_neurips2021_multiome_BMMC_processed.h5ad")
if not path.is_file():
with (
urlopen(url) as raw,
tqdm.wrapattr(raw, "read", total=int(raw.headers["Content-Length"])) as wrapped,
gzip.open(wrapped, 'rb') as f_in,
path.open('wb') as f_out,
):
shutil.copyfileobj(f_in, f_out)
adata_atac = sc.read(path)
adata_atac.X = (adata_atac.X > 0)*1
sc.pp.highly_variable_genes(adata_atac, n_top_genes=13634)
adata_atac = adata_atac[:,adata_atac.var['highly_variable']]
Problem seems to be fixed in the newest scanpy:
Please make sure these conditions are met
What happened?
Hi, I intend to use n_top_gene to determine the number of hvgs I intend to have, but I met such errors.
I used to meet this error previously, but no effective solutions.
Could you please double check it? Thanks.
Minimal code sample
Error output
No response
Versions
anndata 0.9.2 scanpy 1.9.6