maximilianh / cellBrowser

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org
https://github.com/ucscGenomeBrowser/cellBrowser/
GNU General Public License v3.0
105 stars 42 forks source link

Case for gene_symbols in var from anndata #118

Closed pcm32 closed 5 years ago

pcm32 commented 5 years ago

This PR adds a further case presented by Scanpy where gene symbols are present in a dictionary inside ad.var['gene_symbols'], so that Gene symbols get rescued when transforming to CellBrowser objects.

maximilianh commented 5 years ago

Thanks! Is “else if” valid in python 2? I don’t care too much but I still try to at least have the file parse in py2.

On Thu 11 Jul 2019 at 09:11, Pablo Moreno notifications@github.com wrote:

This PR adds a further case presented by Scanpy where gene symbols are present in a dictionary inside ad.var['gene_symbols'], so that Gene symbols get rescued when transforming to CellBrowser objects.

You can view, comment on, or merge this pull request online at:

https://github.com/maximilianh/cellBrowser/pull/118 Commit Summary

  • Case for gene_symbols in var from anndata

File Changes

Patch Links:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/pull/118?email_source=notifications&email_token=AACL4TNM4ME3STI2YAGDVCDP65LRVA5CNFSM4IBMHUO2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G6VT66Q, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TIF4OU3QTWHP6TVDRDP65LRVANCNFSM4IBMHUOQ .

pcm32 commented 5 years ago

ooops... sorry! fixed now

maximilianh commented 5 years ago

@ivirshup so it turns out that the reason we haven't come across this before is that it's not really a scanpy standard, but only the read_10X function produces it. https://github.com/theislab/scanpy/issues/385 But that's the main exchange format, so many thanks for this! I'll try to test it a little now.

maximilianh commented 5 years ago

This brings up a problem in my testing. I wasn't using a 10X file, but a simple .tsv that I prepared. I'll switch to Scanpy's pbmc 10X sample file for testing.

maximilianh commented 5 years ago

Hmm... scanpy's test matrix is also not a 10X file, rather an h5ad. Given how many breaking changes scanpy is going through, I wonder if this is a good idea. Don't we have a small-ish expression matrix somewhere for testing?

maximilianh commented 5 years ago

After a scanpy/anndata/pandas update, and the scanpy pbmc small test, my minimal test is now broken... sigh...

ivirshup commented 5 years ago

In the test suite for scanpy, there are a few small 10x datasets (scanpy/tests/_data/10x_data). I think these were cut down a bit, so I'm not sure they're 100% compliment to what cellranger puts out. The best option would probably be to test against something from the 10x example datasets.

I'm a little confused about what's happening here, are you saying scanpy is reading a 10x file and generating a dataframe with dictionaries in the columns?

pcm32 commented 5 years ago

Maybe it's a good time to add some travis or circle-ci testing?

maximilianh commented 5 years ago

Thanks Isaac! I’ll try that. I tried the h5ad file in the scanpy test directory and got the new NaN,NaN,NaN error when finding most variable genes.

Then tried a 1k 10x sample from the 10x homepage and got a “cannot find GrCh38” error as there is only a group called “matrix” in cellranger 3 h5 files. Does scanpy support cellranger 3 ?

I guess I should upgrade scanpy and anndata to the current master before trying again.

On Sat 13 Jul 2019 at 04:31, Pablo Moreno notifications@github.com wrote:

Maybe it's a good time to add some travis or circle-ci testing?

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/pull/118?email_source=notifications&email_token=AACL4TIOWGUMGTRZ2UBUT3LP7G4INA5CNFSM4IBMHUO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ3P4AI#issuecomment-511114753, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TJJN4QB6F5ULKBGUPTP7G4INANCNFSM4IBMHUOQ .

ivirshup commented 5 years ago

With v3, you shouldn't have to specify genome (you shouldn't have to specify genome for v2 either if only one genome is there). I'm able to read in a v3 h5 file with currently released versions of scanpy and anndata using sc.read_10x_h5.

ivirshup commented 5 years ago

Giving an example of what worked for me:

import scanpy as sc
!wget http://cf.10xgenomics.com/samples/cell-exp/3.0.2/1k_hgmm_v3/1k_hgmm_v3_filtered_feature_bc_matrix.h5
adata = sc.read_10x_h5("./1k_hgmm_v3_filtered_feature_bc_matrix.h5")   

The returned adata is a view, which is weird, but otherwise this seems to work.

maximilianh commented 5 years ago

Many thanks, I must have an older version of anndata or must have done something else wrong. Thanks for your help Isaac!

On Sun 14 Jul 2019 at 19:36, Isaac Virshup notifications@github.com wrote:

Giving an example of what worked for me:

import scanpy as sc!wget http://cf.10xgenomics.com/samples/cell-exp/3.0.2/1k_hgmm_v3/1k_hgmm_v3_filtered_feature_bc_matrix.h5 adata = sc.read_10x_h5("./1k_hgmm_v3_filtered_feature_bc_matrix.h5")

The returned adata is a view, which is weird, but otherwise this seems to work.

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/pull/118?email_source=notifications&email_token=AACL4TPJNLT4K2WNNRMWYHLP7PPBDA5CNFSM4IBMHUO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ4TFSY#issuecomment-511259339, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TOACG62K6WAQXTLQ23P7PPBDANCNFSM4IBMHUOQ .