Closed pcm32 closed 3 years ago
I haven't tested this though, do you have any working tests @matthewspeir @maximilianh that I could add to some GitHub actions here? Thanks!
This should not fix your problem. usePandas is always false, isn't it?
I wonder if your problem has to do with the raw values. Have you already run with the -d option set?
You are right, it is not using the pandas section (I thought it was using it if pandas was installed).
What do you suggest to fix this? ahhh, -d
, let me check that.
Could it be due to:
INFO:root:Auto-detecting number type of /private/tmp/outdir/exprMatrix.tsv.gz
DEBUG:root:spooling back 0 saved rows
DEBUG:root:Yielding gene ENSDARG00000000001, sym ENSDARG00000000001, 96 fields
DEBUG:root:Matrix type is: float
INFO:root:Auto-detect: Numbers in matrix are of type 'float'
DEBUG:root:spooling back 1 saved rows
DEBUG:root:Yielding gene ENSDARG00000000001, sym ENSDARG00000000001, 96 fields
INFO:root:Auto-detected gene IDs type: symbols
?
Also, see attached the entire log with debugging, I removed some repetitive lines. small_UCSC_debug_atlas_gene_symbols.txt
Two comments back it was a different dataset I was trying...
Sorry, I have the impression that this means that this file simply does not contain gene symbols, is this correct?
On Fri, Apr 16, 2021 at 2:38 PM Pablo Moreno @.***> wrote:
Two comments back it was a different dataset I was trying...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/pull/217#issuecomment-821144714, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TMDE5DTPLBSNUMZ62TTJAVVBANCNFSM43BK7BGQ .
If this is the case, there is a way to make it work, but I first want to confirm that this is true.
On Fri, Apr 16, 2021 at 2:55 PM Maximilian Haeussler @.***> wrote:
Sorry, I have the impression that this means that this file simply does not contain gene symbols, is this correct?
On Fri, Apr 16, 2021 at 2:38 PM Pablo Moreno @.***> wrote:
Two comments back it was a different dataset I was trying...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/pull/217#issuecomment-821144714, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TMDE5DTPLBSNUMZ62TTJAVVBANCNFSM43BK7BGQ .
The file should have gene symbols. The structure of the annData is:
AnnData object with n_obs × n_vars = 96 × 17500
obs: 'age', 'developmental_stage', 'genotype', 'organism_part', 'organism', 'phenotype', 'post_analysis_well_quality', 'single_cell_quality', 'single_cell_well_quality', 'block', 'phenotype.1', 'single_cell_identifier', 'age_ontology', 'developmental_stage_ontology', 'genotype_ontology', 'organism_part_ontology', 'organism_ontology', 'phenotype_ontology', 'post_analysis_well_quality_ontology', 'single_cell_quality_ontology', 'single_cell_well_quality_ontology', 'block_ontology', 'phenotype_ontology.1', 'single_cell_identifier_ontology', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'n_counts', 'n_genes', 'louvain_resolution_0.7', 'louvain_resolution_1.0'
var: 'gene_symbols', 'chromosome', 'start', 'end', 'width', 'source', 'type', 'score', 'phase', 'gene_version', 'gene_name', 'gene_source', 'gene_biotype', 'mito', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_counts', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
uns: 'hvg', 'markers_louvain_resolution_0.7', 'markers_louvain_resolution_0.7_filtered', 'markers_louvain_resolution_1.0', 'markers_louvain_resolution_1.0_filtered', 'neighbors', 'pca'
obsm: 'X_pca', 'X_tsne_perplexity_1', 'X_tsne_perplexity_10', 'X_tsne_perplexity_15', 'X_tsne_perplexity_20', 'X_tsne_perplexity_25', 'X_tsne_perplexity_30', 'X_tsne_perplexity_35', 'X_tsne_perplexity_40', 'X_tsne_perplexity_45', 'X_tsne_perplexity_5', 'X_tsne_perplexity_50', 'X_umap_neighbors_n_neighbors_10', 'X_umap_neighbors_n_neighbors_100', 'X_umap_neighbors_n_neighbors_15', 'X_umap_neighbors_n_neighbors_20', 'X_umap_neighbors_n_neighbors_25', 'X_umap_neighbors_n_neighbors_3', 'X_umap_neighbors_n_neighbors_30', 'X_umap_neighbors_n_neighbors_5', 'X_umap_neighbors_n_neighbors_50'
varm: 'PCs'
obsp: 'connectivities', 'distances'
so you can see 'gene_symbols' under 'var' and then, the var contains:
....but,... aha, you are right, there is an issue with the gene symbols:
index | gene_symbols | chromosome | start | end | width | source | type | score | phase | gene_version | gene_name | gene_source | gene_biotype | mito | n_cells_by_counts | mean_counts | log1p_mean_counts | pct_dropout_by_counts | total_counts | log1p_total_counts | n_counts | n_cells | highly_variable | means | dispersions | dispersions_norm |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSDARG00000000001 | ENSDARG00000000001 | 9 | 34112067 | 34121839 | 9773 | ensembl_havana | gene | 6 | slc35a5 | ensembl_havana | protein_coding | False | 15 | 10.391168 | 2.4328382 | 84.375 | 997.55206 | 6.9063063 | 997.55206 | 15 | False | 0.048443687103584876 | -0.08688698682354548 | 0.20369561 | ||
ENSDARG00000000002 | ENSDARG00000000002 | 9 | 34089156 | 34113209 | 24054 | ensembl_havana | gene | 8 | ccdc80 | ensembl_havana | protein_coding | False | 4 | 6.692166 | 2.0402024 | 95.83333333333334 | 642.44794 | 6.466841 | 642.44794 | 4 | True | 0.038212520539707966 | 1.1615255578255084 | 1.0531479 | ||
ENSDARG00000000018 | ENSDARG00000000018 | 4 | 15081385 | 15103696 | 22312 | ensembl_havana | gene | 9 | nrf1 | ensembl_havana | protein_coding | False | 93 | 483.9896 | 6.1841273 | 3.125 | 46463.0 | 10.746433 | 46463.0 | 93 | False | 1.2718640571022153 | 1.6039805832956242 | 0.038439106 | ||
ENSDARG00000000019 | ENSDARG00000000019 | 4 | 15011341 | 15059876 | 48536 | ensembl_havana | gene | 9 | ube2h | ensembl_havana | protein_coding | False | 27 | 55.15625 | 4.028138 | 71.875 | 5295.0 | 8.574707 | 5295.0 | 27 | True | 0.24869120688275356 | 1.5673604972588937 | 1.3292886 | ||
ENSDARG00000000068 | ENSDARG00000000068 | 12 | 33484458 | 33537126 | 52669 | ensembl_havana | gene | 9 | slc9a3r1a | ensembl_havana | protein_coding | False | 40 | 60.479168 | 4.1186986 | 58.33333333333333 | 5806.0 | 8.66682 | 5806.0 | 40 | True | 0.24480807109196906 | 1.2071062863219768 | 1.0841622 |
So this is an issue with our AnnData generation... sorry about this.
Great! :-)
how strange... what is "gene_name" ? I haven't seen his field yet...
BTW: awesome that you have h5ad files now!! I wrote hundreds of lines of code to convert your text files to cell browser files, but then at some point gave up, forgot why. It would be cool to try again with the h5ad files...
On Fri, Apr 16, 2021 at 3:07 PM Pablo Moreno @.***> wrote:
So this is an issue with our AnnData generation... sorry about this.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/pull/217#issuecomment-821161156, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TP67E5KLQERQHSRSBLTJAZBNANCNFSM43BK7BGQ .
Hi Pablo, can I close this pull request?
We're just releasing 1.0.1 which includes the fix for the "import scanpy" + "exit code 0" problem that you found recently.
Currently when using pandas the var['gene_symbol'], which is used when not using pandas, is neglected. This enables the gene_symbol identification to work in both routes.