Closed pcm32 closed 1 year ago
I started seeing this when adding Scanpy and Pandas to the bioconda deps. Probably since the route is different when pandas is installed.
Are you using the raw values from the matrix or the processed values? Is it possible ad.raw.vars doesn't have the gene symbols?
On Fri, Apr 16, 2021 at 1:14 PM Pablo Moreno @.***> wrote:
I started seeing this when adding Scanpy and Pandas to the bioconda deps. Probably since the route is different when pandas is installed.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/216#issuecomment-821103406, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TMT5L6MWCJI3XMZAJDTJALYLANCNFSM43BKJMNQ .
I have checked, and the AnnData does have vars['gene_symbols'].... not sure about raw.vars...
how do I tell it to use the raw values? I always assumed it was using the processed values (cbScanpy...)
This is the code and it seems to get gene_symbols:
# when reading 10X files, read_h5 puts the geneIds into a separate field
# and uses only the symbol. We prefer ENSGxxxx|<symbol> as the gene ID string
if "gene_ids" in var:
genes = geneSeriesToStrings(var["gene_ids"], indexFirst=False)
elif "gene_symbols" in var:
genes = geneSeriesToStrings(var["gene_symbols"], indexFirst=True)
elif "Accession" in var: # only seen this in the ABA Loom files
genes = geneSeriesToStrings(var["Accession"], indexFirst=False)
else:
genes = var.index.tolist()
Is it possible that your object contains a gene_ids slot ? I have a feeling this is the same problem as the other issue that you opened. Should we close this?
As for the other question: you're right, it gets the raw data by default. I didn't mean that it defaults to raw.
Right now, the function uses the raw values only if you force it to:
anndataMatrixToTsv(ad, matFname, usePandas=False, useRaw=False)
because this option is not something most people want, it's not exposed on the Unix command line yet. I was asking in case you're calling it from python yourself.
I believe that this is solved now, is this correct? Can we close this ticket?
Hey @pcm32, in PR https://github.com/maximilianh/cellBrowser/pull/231 by @redst4r we're discussing moving everywhere to the .mtx.gz format by default. I'm tending to stick with .tsv.gz for now, but give an option to .mtx.gz is used if you specify "-f mtx". Any thoughts?
In your pipelines do you have assumptions about the name of the output file?
@pcm32 and @maximilianh, is there anything else that needs to be done here? Or can we close this?
cbImportScanpy now has this option, I think this answers @pcm32 question:
--proc when exporting, do not use the raw input data, instead use the normalized and corrected matrix scanpy. This has no effect if the anndata.raw attribute is not used in the anndata object
On Fri, May 27, 2022 at 1:00 AM Matt Speir @.***> wrote:
@pcm32 https://github.com/pcm32 and @maximilianh https://github.com/maximilianh, is there anything else that needs to be done here? Or can we close this?
— Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/216#issuecomment-1139128028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TNG22BKU5WKSMZZKVTVL7667ANCNFSM43BKJMNQ . You are receiving this because you were mentioned.Message ID: @.***>
Sounds good! We'll close this for now.
This has stopped working between 0.5.x and 1.0.0. I had added this feature in #118 but it seems to have been reverted. Can we please re-instate the functionality? It is relevant for our Single Cell Expression Atlas AnnData files.