Closed matthewspeir closed 5 years ago
Oh. I had no idea that cbScanpy can read loom files, I didn't know that loom files store meta data or how to get it out of there. cbScanpy currently doesn't officially support loom the help message doesn't list it as a supported file format.
That being said maybe it should support it? Where is this loom file? How did you run it on the loom file?
A loom file should already store all the information we need, so maybe running on loom files with cbScanpy doesn't make a lot of sense? Does anyone use Loom files? I've never seen one in the wild. Could you work around it by using an alternative file format?
Yeah, I think it's just that the latest scanpy version supports loom files? I just tried it to see if it would work, and it did, haha.
The loom file is on dev here:
/hive/users/mspeir/cellbrowserTest/pancreas/new_metadata_test/matrix_files/Single_cell_transcriptome_analysis_of_human_pancreas.loom
Command:
cbScanpy -e Single_cell_transcriptome_analysis_of_human_pancreas.loom -o cbScanpyOut_pancreas_aging_loom -n HCA_Pancreas_Aging_Loom -s
But does it really contain all of the information needed? It contains some metadata, but it doesn't have info like Louvain Cluster, UMI Count, etc. that your program outputs. It also doesn't include any coordinates, so you would have to do some clustering, right?
The only place I've seen loom files is from the HCA DCP Data Browser. Last I checked it's the default selected option.
OK, so it sounds like loom files ONLY contain some meta data, no coordinates or other algorithm results. So we would need some separate tool to get the meta data out of them? Is there some other way to get this meta data in another format?
On Wed, Mar 6, 2019 at 9:05 PM Matt Speir notifications@github.com wrote:
Yeah, I think it's just that the latest scanpy version supports loom files? I just tried it to see if it would work, and it did, haha.
The loom file is on dev here:
/hive/users/mspeir/cellbrowserTest/pancreas/new_metadata_test/matrix_files/Single_cell_transcriptome_analysis_of_human_pancreas.loom
Command: cbScanpy -e Single_cell_transcriptome_analysis_of_human_pancreas.loom -o cbScanpyOut_pancreas_aging_loom -n HCA_Pancreas_Aging_Loom -s
But does it really contain all of the information needed? It contains some metadata, but it doesn't have info like Louvain Cluster, UMI Count, etc. that your program outputs. It also doesn't include any coordinates, so you would have to do some clustering, right?
The only place I've seen loom files is from the HCA DCP Data Browser. Last I checked it's the default selected option.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/78#issuecomment-470256267, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TcyzCt8VPlFxSXooOxyXUY9AUFFzks5vUB-dgaJpZM4bfhzp .
Would another tool be needed? It looks like scanpy is able to extract and store the metadata based on lines like:
... storing 'construction_approach_label' as categorical
Maybe this is stored in the resulting 'anndata.h5ad' file in the cbScanpy output directory? Is there a way I could check that?
Through the DCP there's not really an easy way to get the metadata yet, though I think that's going to change in the near future. For this specific loom file (Single_cell_transcriptome_analysis_of_human_pancreas.loom
), I have the same information in csv format in the cells.csv, genes.csv, and expression.csv files in the directory /hive/users/mspeir/cellbrowserTest/pancreas/new_metadata_test/matrix_files/Single_cell_transcriptome_analysis_of_human_pancreas.csv
.
I'll give it a quick go, but I think we should not spend more time on this. If the DCP exports meta data only in loom format, then we shouldn't worry about that. Hardly anyone will be able to read that. This sounds rather a DCP problem than a problem for the cell browser.
The csv files don't seem to contain these meta data strings.
On Thu, Mar 7, 2019 at 4:33 PM Matt Speir notifications@github.com wrote:
Would another tool be needed? It looks like scanpy is able to extract and store the metadata based on lines like: ... storing 'construction_approach_label' as categorical
Maybe this is stored in the resulting 'anndata.h5ad' file in the cbScanpy output directory? Is there a way I could check that?
Through the DCP there's not really an easy way to get the metadata yet, though I think that's going to change in the near future. For this specific loom file (Single_cell_transcriptome_analysis_of_human_pancreas.loom), I have the same information in csv format in the cells.csv, genes.csv, and expression.csv files in the directory /hive/users/mspeir/cellbrowserTest/pancreas/new_metadata_test/matrix_files/Single_cell_transcriptome_analysis_of_human_pancreas.csv .
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/78#issuecomment-470572605, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TYd_teRQN6DSrAkjVzNPsOck_Ltyks5vUTE2gaJpZM4bfhzp .
The 'cells.csv' and 'genes.csv' files contain almost all of the metadata fields listed in my first note.
$ head -n1 cells.csv | tr "," "\n"
cellkey
genes_detected
donorkey
genus_species_ontology
genus_species_label
ethnicity_ontology
ethnicity_label
disease_ontology
disease_label
development_stage_ontology
development_stage_label
organ_ontology
organ_label
organ_part_ontology
organ_part_label
librarykey
input_nucleic_acid_ontology
input_nucleic_acid_label
construction_approach_ontology
construction_approach_label
end_bias
strand
short_name
protocol
bundle_uuid
And
$ head -n1 genes.csv | tr "," "\n"
featurekey
featurename
featuretype
chromosome
featurestart
featureend
isgene
Ohh! Sorry, I don't know what I was thinking. Nothing I guess.
Yes, in this case, what you'd do, and I see it's not obvious it all: you treat cells.csv as the meta data (no need for the gene metadata).
Your cbScanpy run gives you scanpy-related meta data.
You then combine both meta files using "cbTool metaCat".
This is a typical example of meta data combining, explained here: https://cellbrowser.readthedocs.io/combine.html
Shall I better document this somehow? I don't know how or where.
also we need to document cbMarkerAnnotate somewhere... but that's unrelated.
On Thu, Mar 7, 2019 at 5:04 PM Matt Speir notifications@github.com wrote:
The 'cells.csv' and 'genes.csv' files contain almost all of the metadata fields listed in my first note.
$ head -n1 cells.csv | tr "," "\n" cellkey genes_detected donorkey genus_species_ontology genus_species_label ethnicity_ontology ethnicity_label disease_ontology disease_label development_stage_ontology development_stage_label organ_ontology organ_label organ_part_ontology organ_part_label librarykey input_nucleic_acid_ontology input_nucleic_acid_label construction_approach_ontology construction_approach_label end_bias strand short_name protocol bundle_uuid
And
$ head -n1 genes.csv | tr "," "\n" featurekey featurename featuretype chromosome featurestart featureend isgene
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/78#issuecomment-470585296, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TQpOjvLmR2xAzR6GPUKw5KY93xikks5vUTh9gaJpZM4bfhzp .
After your latest update, running cbScanpy on the same loom file fails with the following error:
... storing 'featuretype' as categorical
Traceback (most recent call last):
File "/cluster/home/mspeir/miniconda3/bin/cbScanpy", line 10, in <module>
sys.exit(cbScanpyCli())
File "/cluster/home/mspeir/miniconda3/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3748, in cbScanpyCli
adata = cbScanpy(matrixFname, confFname, figDir, logFname, matrixOutFname)
File "/cluster/home/mspeir/miniconda3/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3532, in cbScanpy
sc.pl.violin(adata, ['n_genes', 'n_counts', 'percent_mito'], jitter=0.4, multi_panel=True)
File "/cluster/home/mspeir/miniconda3/lib/python3.6/site-packages/scanpy/plotting/_anndata.py", line 622, in violin
'Did not find {} in adata.obs_keys().'.format(key))
ValueError: Either use observation keys or variable names, but do not mix. Did not find n_counts in adata.obs_keys().
Command:
cbScanpy -o cbScanpyOut_pancreas_aging_loom_v2 -s -n HCA_Pancreas_Aging_Loom -e Single_cell_transcriptome_analysis_of_human_pancreas.loom
Input file:
/hive/users/mspeir/cellbrowserTest/pancreas/new_metadata_test/matrix_files/Single_cell_transcriptome_analysis_of_human_pancreas.loom
Ah, darn, I thought this change wouldn't affect the import... looking...
Thanks for looking into it, Max!
No need for thanking, Brian Lee is not listening and I messed it up... :)
OK, should be fixed, release 0.4.53, thanks!
On Mon, Mar 11, 2019 at 8:07 PM Maximilian Haeussler maximilianh@gmail.com wrote:
No need for thanking, Brian Lee is not listening and I messed it up... :)
I think we can close this now. I can confirm that the meta.tsv contains the metadata from the input loom file.
I can see when I run cbScanpy on a loom file that it picks out that there is metadata (note the '... storing' lines):
But when you look at the meta.tsv afterward, there's nothing in there but the standard 'louvain cluster', etc. columns. It's like it just disappears. Is there a way to capture this metadata?