Make cross-species profiles available for paper submission

jorvis commented 4 years ago

Way too much conversation/requests about this in Hangouts, Slack, e-mail, etc. so consolidating here.

Profiles:

"Motor Cortex Cross Species"
"Motor Cortex Merged Species"

Current issues:

[x] Pull custom colors from each H5 file and apply them to tSNEs
[x] The bar plots for GABAergic neurons are a little too crowded, since there are the most subtypes to display. Is there a way to adjust the font size on the x-axis so that all the species names fit, or to hide the label on the x-axis and just show a color? (from Seth)
[x] Dataset: Merged Excitatory Neurons from Primary Motor Cortex - Cross Species Comparison (Violin) - primary analysis doesn't work
[x] Dataset: Merged Excitatory Neurons from Primary Motor Cortex - Cross Species Comparison (Violin) - comparison tool doesn't work
[x] Dataset: Primary Motor Cortex from Human (M1) - Cross Species Comparison - comparison tool doesn't show the dataset
[x] Modify the violin/bar plots in these profiles to see if the colors can match the tSNE groupings
[ ] Allow cluster comparison before marker gene identification
[x] Generate permalinks

Permalinks (right click and copy for the shorter version of the link):

brianherb commented 4 years ago

For the merged datasets, you can use the 'subclass_label' column in the .obs for primary analysis

brianherb commented 4 years ago

To fix colors - the columns are named to match _label with _color - so to match the colors in the paper, the Motor Cortex Merged Species profile can use the 'species_color' column in the .obs slot (or subclass_color if we are coloring by cell type). Also, the Motor Cortex Cross Species profile can use the 'subclass_color' column in the .obs slot to match the colors in the paper.

hertzron commented 4 years ago

Hi Joshua,

The primary analysis is working but when I use it on the violin plots it does not separate by species (mouse/human/zebrafish) even though the violins show it. Here is a screen shot.

[image: image.png]

Best, Ronna

On Fri, Mar 27, 2020 at 2:50 PM Brian Herb notifications@github.com wrote:

To fix colors - the columns are named to match _label with _color - so to match the colors in the paper, the Motor Cortex Merged Species profile can use the 'species_color' column in the .obs slot (or subclass_color if we are coloring by cell type). Also, the Motor Cortex Cross Species profile can use the 'subclass_color' column in the .obs slot to match the colors in the paper.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/108#issuecomment-605210365, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFF5X4LLAHZTTG74GY45CDRJTYQRANCNFSM4LVFLNKQ .

jorvis commented 4 years ago

The issue with the comparison tool is that all of these datasets have a large number of columns and when it generates the condition list it's assuming these are all possible valid grouping conditions. Obviously this needs to be refactored to let the user select which to include rather than assuming all. Later tonight I'll have to add the columns from these particular dataset to an exclusion list which will be a temporary fix until that feature is added.

jorvis commented 4 years ago

I fixed the comparison tool for the datasets listed, and added a general check to display an error properly if a dataset has too many obs columns not in our exclusion list. For the "Cross Species Comparison (Violin) - comparison tool doesn't work" dataset, I left species and subclass_label so comparison were possible at those aggregate levels.

jorvis commented 4 years ago

Notes on the "Allow cluster comparison before marker gene identification" request. This is tricky.

In the standard flow the user generates a louvain clustering and that key is what is used to differentiate the cells into groups. We got around that in datasets where analyses were precomputed by specifying that the clustering column could be called either 'cell_type' or 'cluster'. I'd see that automatically and use it.

In most of these datasets though there are many, many columns which could be used as the clustering key and, indeed, sometimes it seems you want to use different ones ('subclass_label', 'cross_species_cluster_label', etc.). So this means for any steps which involve displaying information based on clusters we need to modify the UI to show ALL the obs columns and let the user choose which they want in that instance.

@brianherb @carlocolantuoni Can anyone think of another option?

hertzron commented 4 years ago

Is it possible to have the user choose only if there are multiple columns so that not all users will have a complex experience?

Is there a way in the column naming to have some convention so that not all columns appear as options?

Carlo and Seth - I am trying to think how to smooth the user experience that because your datasets have so many columns may turn into very fatiguing and not intuitive.

Thanks for your help, Ronna

Get Outlook for iOShttps://aka.ms/o0ukef

From: Joshua Orvis notifications@github.com Sent: Wednesday, April 1, 2020 12:10:19 AM To: nemoarchive/analytics analytics@noreply.github.com Cc: hertzron hertzron@gmail.com; Comment comment@noreply.github.com Subject: Re: [nemoarchive/analytics] Make cross-species profiles available for paper submission (#108)

Notes on the "Allow cluster comparison before marker gene identification" request. This is tricky.

In the standard flow the user generates a louvain clustering and that key is what is used to differentiate the cells into groups. We got around that in datasets where analyses were precomputed by specifying that the clustering column could be called either 'cell_type' or 'cluster'. I'd see that automatically and use it.

In most of these datasets though there are many, many columns which could be used as the clustering key and, indeed, sometimes it seems you want to use different ones ('subclass_label', 'cross_species_cluster_label', etc.). So this means for any steps which involve displaying information based on clusters we need to modify the UI to show ALL the obs columns and let the user choose which they want in that instance.

@brianherbhttps://github.com/brianherb @carlocolantuonihttps://github.com/carlocolantuoni Can anyone think of another option?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/nemoarchive/analytics/issues/108#issuecomment-606877115, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEFF5X3L5JNMHVWLABLEN5TRKJL3XANCNFSM4LVFLNKQ.

carlocolantuoni commented 4 years ago

not sure i understand the exact problem and fix proposed here but maybe this could help: perhaps instead of showing indiv columns to select for a comparison, you could offer a list of the col meta data elements that could be used for comparison of groups of samples. the number of elements in the sample/column meta data (whether it be original meta data or calculated clusters) should be many fewer than the number of columns. carlo

On Tue, Mar 31, 2020 at 5:17 PM hertzron notifications@github.com wrote:

Is it possible to have the user choose only if there are multiple columns so that not all users will have a complex experience?

Is there a way in the column naming to have some convention so that not all columns appear as options?

Carlo and Seth - I am trying to think how to smooth the user experience that because your datasets have so many columns may turn into very fatiguing and not intuitive.

Thanks for your help, Ronna

Get Outlook for iOShttps://aka.ms/o0ukef

From: Joshua Orvis notifications@github.com Sent: Wednesday, April 1, 2020 12:10:19 AM To: nemoarchive/analytics analytics@noreply.github.com Cc: hertzron hertzron@gmail.com; Comment comment@noreply.github.com Subject: Re: [nemoarchive/analytics] Make cross-species profiles available for paper submission (#108)

Notes on the "Allow cluster comparison before marker gene identification" request. This is tricky.

In the standard flow the user generates a louvain clustering and that key is what is used to differentiate the cells into groups. We got around that in datasets where analyses were precomputed by specifying that the clustering column could be called either 'cell_type' or 'cluster'. I'd see that automatically and use it.

In most of these datasets though there are many, many columns which could be used as the clustering key and, indeed, sometimes it seems you want to use different ones ('subclass_label', 'cross_species_cluster_label', etc.). So this means for any steps which involve displaying information based on clusters we need to modify the UI to show ALL the obs columns and let the user choose which they want in that instance.

@brianherbhttps://github.com/brianherb @carlocolantuoni< https://github.com/carlocolantuoni> Can anyone think of another option?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub< https://github.com/nemoarchive/analytics/issues/108#issuecomment-606877115>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEFF5X3L5JNMHVWLABLEN5TRKJL3XANCNFSM4LVFLNKQ

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/108#issuecomment-606880829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7UNZ2FE2DZLOSQ3EFTRKJMYBANCNFSM4LVFLNKQ .

-- Carlo

jorvis commented 4 years ago

The columns in the datasets with the bars don't match the columns in the tSNE plots, so I can't do a direct coloring. I did manage to see how to color by the colors given in any data column though, which is nice, and one of them looks like the attached. If someone wants to choose a reference datasets or provide color mappings (@brianherb ?) I can do the rest. Screenshot from 2020-04-03 21-54-10

jorvis commented 4 years ago

Here's what it looks like after changing the cluster groupings and coloring to be based on subclass_label instead. It should be noted that the colors defined in that column do not appear to match the custom colors sent for the other datasets in the profile. I will work on coloring them manually instead. (and getting rid of the legend which appears for some reason with one hex color code)

Screenshot from 2020-04-07 11-02-53

jorvis commented 4 years ago

I've updated the coloring in all of the cross-species profile. It just so happens that the last dataset is made up of the columns where, when you use the shared coloring profiles, are the most drab, indistinguishable set of columns possible. Not sure if you guys would like to change that, or that's just what it happens to be for this dataset. Uploading panel showing several for comparison.

Screenshot from 2020-04-07 12-33-19

hertzron commented 4 years ago

Brian - I believe that these colors are as decided by the consortium? Is that correct? If so - then the profile should be (almost) ready to be public.

Brian - looking at the info boxes a few things should be modified for it to be complete:

could you pls add the bioarchive link to each of the info boxes (currently stating Additional information is available via manuscript XXX(bioarchive link).
for the neurons subclasses - can you please spell out the cell types. One can imagine some people not knowing what's SST, IT etc.

Thanks, Ronna

On Tue, Apr 7, 2020 at 1:49 PM Joshua Orvis notifications@github.com wrote:

I've updated the coloring in all of the cross-species profile. It just so happens that the last dataset is made up of the columns where, when you used the shared coloring profiles, are the most drab, indistinguishable set of columns possible. Not sure if you guys would like to change that, or that's just what it happens to be for this dataset. Uploading panel showing several for comparison.

[image: Screenshot from 2020-04-07 12-33-19] https://user-images.githubusercontent.com/330899/78702413-29e9b680-78ce-11ea-8c88-76401fa97a56.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/108#issuecomment-610530028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFF5X5DALUWBQQTXA4CBQLRLNRSJANCNFSM4LVFLNKQ .

brianherb commented 4 years ago

The colors in the updated profile look good and match the publication.

I added the bioarchive link to the descriptions

With regard to cell type descriptions, let's check this before I change all :

o Glutamatergic neuron subclasses:

L2/3 IT (Layer 2/3 intratelencephalic)
L5 IT (Layer 5 intratelencephalic)
L5 ET (Layer 5 extratelencephalic-projecting)
L5/6 NP (Layer 5/6 non-projecting pyramidal)
L6b (Layer 6b)
L6 CT (Layer 6 cortico-thalamic)
L6 IT (Layer 6 intratelencephalic)
L6 IT Car3 (Layer 6 intratelencephalic carbonic anhydrase 3 expressing)

o GABAergic neuron subclasses:

Lamp5 (lysosomal associated membrane protein family member 5 expressing)
Pvalb (parvalbumin expressing)
Sncg (synuclein gamma expressing)
Sst (somatostatin expressing)
Sst Chodl (somatostatin and chondrolectin expressing)
Vip (vasoactive intestinal peptide expressing)

o Non-neuronal cell subclasses:

Astro (Astrocyte)
Endo (Endothelial)
Micro-PVM (Microglia / Perivascular Macrophages )
Oligo (Oligodendrocytes)
OPC (Oligodendrocyte Precursor)
VLMC (Vascular and Leptomeningeal Cells)

hertzron commented 4 years ago

This suggestion looks great! Brian - once changed I believe we are ready to make it public, correct?

Best, Ronna

On Wed, Apr 8, 2020 at 7:44 AM Brian Herb notifications@github.com wrote:

The colors in the updated profile look good and match the publication.

I added the bioarchive link to the descriptions

With regard to cell type descriptions, let's check this before I change all :

o Glutamatergic neuron subclasses:

L2/3 IT (Layer 2/3 intratelencephalic)

L5 IT (Layer 5 intratelencephalic)

L5 ET (Layer 5 extratelencephalic-projecting)

L5/6 NP (Layer 5/6 non-projecting pyramidal)

L6b (Layer 6b)

L6 CT (Layer 6 cortico-thalamic)

L6 IT (Layer 6 intratelencephalic)

L6 IT Car3 (Layer 6 intratelencephalic carbonic anhydrase 3 expressing)

o GABAergic neuron subclasses:

Lamp5 (lysosomal associated membrane protein family member 5 expressing)

Pvalb (parvalbumin expressing)

Sncg (synuclein gamma expressing)

Sst (somatostatin expressing)

Sst Chodl (somatostatin and chondrolectin expressing)

Vip (vasoactive intestinal peptide expressing)

o Non-neuronal cell subclasses:

Astro (Astrocyte)

Endo (Endothelial)

Micro-PVM (Microglia / Perivascular Macrophages )

Oligo (Oligodendrocytes)

OPC (Oligodendrocyte Precursor)

VLMC (Vascular and Leptomeningeal Cells)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/108#issuecomment-610910377, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFF5X4YQP5Z7DZLR7Y2PSTRLRPRPANCNFSM4LVFLNKQ .

brianherb commented 4 years ago

all info pages have been updated

@jorvis - could you also update the species colors in the "Motor Cortex Merged Species" profile? This profile contains a duplicate set of datasets also displayed in the "Motor Cortex Cross Species" (which has been updated and looks great!)

hertzron commented 4 years ago

Hi Joshua, Can we make these two cross species profiles public and update the permalinks that were used for the paper with the updated datasets (colored violins etc)? Thanks, Ronna

On Wed, Apr 8, 2020 at 9:52 AM Brian Herb notifications@github.com wrote:

all info pages have been updated

@jorvis https://github.com/jorvis - could you also update the species colors in the "Motor Cortex Merged Species" profile? This profile contains a duplicate set of datasets also displayed in the "Motor Cortex Cross Species" (which has been updated and looks great!)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/108#issuecomment-610972611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFF5XYAECCFZOZ4LBUJKB3RLR6QFANCNFSM4LVFLNKQ .

jorvis commented 4 years ago

Both are now public and the permalinks don't need to be updated. The permalinks always link to the profiles, even if they are updated. This allows us to make corrections/improvements without the complication of updating manuscript links.

jorvis commented 4 years ago

Added ronna's bug report as a new ticket #110

jorvis commented 4 years ago

@brianherb - apologies, I just saw your comment above. To verify, that would change the display from looking like this:

nemo_pre

To instead look like this, losing the cross-organism visualization aspect (no custom color applied yet to the second one):

nemo_post

hertzron commented 4 years ago

The ideal would be to have all three side by side. Is this at all feasible?

On Mon, Apr 13, 2020 at 3:56 PM Joshua Orvis notifications@github.com wrote:

@brianherb https://github.com/brianherb - apologies, I just saw your comment above. To verify, that would change the display from looking like this:

[image: nemo_pre] https://user-images.githubusercontent.com/330899/79155466-c3b1d780-7d96-11ea-9fff-3f3a4fe1b6b2.png

To instead look like this, losing the cross-organism visualization aspect (no custom color applied yet to the second one):

[image: nemo_post] https://user-images.githubusercontent.com/330899/79155559-e7751d80-7d96-11ea-8906-8d341fc20519.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/108#issuecomment-613067874, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFF5X53ZSSV5RLUZLNMMWTRMNU6PANCNFSM4LVFLNKQ .

nemoarchive / analytics

Make cross-species profiles available for paper submission #108