Closed matthewspeir closed 5 years ago
The Louvain message is a left over from the original code. I’ll change it.
As far as the error is concerned, I’m afraid that this is another scanpy change again. I’ll look into it tomorrow... unfortunately that’s a bit late for you...
On Tue 13 Aug 2019 at 22:47, Matt Speir notifications@github.com wrote:
When trying to recalculate the markers for a dataset, I get the following error:
INFO:root:Finding top markers for each cluster Traceback (most recent call last): File "/cluster/home/mspeir/miniconda3/bin/cbScanpy", line 10, in
sys.exit(cbScanpyCli()) File "/cluster/home/mspeir/miniconda3/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 4680, in cbScanpyCli adata, params = cbScanpy(matrixFname, metaFname, inCluster, confFname, figDir, logFname) File "/cluster/home/mspeir/miniconda3/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 4525, in cbScanpy sc.tl.rank_genes_groups(adata, clusterField) File "/cluster/home/mspeir/miniconda3/lib/python3.6/site-packages/scanpy/tools/_rank_genes_groups.py", line 119, in rank_genes_groups adata, groups_order, groupby) File "/cluster/home/mspeir/miniconda3/lib/python3.6/site-packages/scanpy/utils.py", line 649, in select_groups groups_order = adata.obs[key].cat.categories File "/cluster/home/mspeir/miniconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 4368, in getattr return object.getattribute(self, name) File "/cluster/home/mspeir/miniconda3/lib/python3.6/site-packages/pandas/core/accessor.py", line 133, in get accessor_obj = self._accessor(obj) File "/cluster/home/mspeir/miniconda3/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2378, in init self._validate(data) File "/cluster/home/mspeir/miniconda3/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2387, in _validate raise AttributeError("Can only use .cat accessor with a " AttributeError: Can only use .cat accessor with a 'category' dtype Command:
cbScanpy -e exprMatrix.tsv.gz -m meta.tsv -o tms-bat-facs_marker-recalc -n tms-bat-facs --inCluster 'Louvain Cluster'
Files:
/hive/data/inside/cells/datasets/tabula-muris-senis/tms-bat-facs
(Also is it really using the input clusters to recalc the marker genes? I see this in the output which makes me think it's not: INFO:root:Found 22 louvain clusters...)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/128?email_source=notifications&email_token=AACL4TJE2PMFMXG2JNLOMR3QEMMVVA5CNFSM4ILOUIQKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HFBXJIQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TLUEI477NTRT4PYPSTQEMMVVANCNFSM4ILOUIQA .
thanks for your email reminder. This also happens in my scanpy and I'm just looking, it didn't happen for the other datasets... looking...
The problem is the Louvain Cluster was auto-detected to be in "number" format instead of the usual "category" format (e.g. "Cluster 1" is "category", but just "1" is "number"). I'm pretty sure that this used to work, as I know I've had this problem before and even opened an issue over in the scanpy github and it went away. But I'll add the same fix I've used before to force it to category format now.
Now that I've fixed the data type of the Louvain Cluster, it's complaining that the mean of some gene is 0. I think this could be considered a bug in scanpy, but let's ignore that for now. I think I need to run highly variable genes first before finding the markers and that function needs to know if the matrix has been log'ed before or not... sigh... not sure what to do.... I'll log the matrix for now I guess, and we'll think about it later...
Thanks, Max!
When you say "it didn't happen for the other datasets...", did you use the 'Louvain Cluster' field as the cluster field? Looking at the FACS bladder dataset, it looks like you used the 'free_annotation' field rather than the 'Louvain Cluster' field. I thought we were going with the Louvain Cluster field for the label, not cell type.
No wait, what I wrote is true in general, but not for this particular dataset, as it's been log'ed already.
The problem I think is cluster 4: it contains only a single cell. That crashes the marker gene step (which is probably another scanpy bug). A cluster with a single cell doesn't make a lot of sense... I'll try something to remove this cell, but probably we should ask Angela is there is not something wrong here...
OK I've committed something, it contains a lot of changes. Can you do the --pre upgrade? 0.6a2, it'll break something else but at least should get you over this. The Scanpy/Pandas combo is a nightmore for me to work with, they have very different conventions and scanpy still breaks on so many things. It may also not work with your scanpy version in which case you probably should upgrade your scanpy (it's a bit more mature right now)
Seems to have worked well. Although the cluster names in the output markers file don't seem to match those in the 'Louvain Cluster' input column. The input 'Louvain Cluster' column contains names like 1, 2, 3, etc. whereas my output markers file now has names like '0_B cell'.
Wow that’s a weird bug. What is the name of the field that contains “0_B cell” ? “louvain” ?
On Wed 4 Sep 2019 at 18:45, Matt Speir notifications@github.com wrote:
Seems to have worked well. Although the cluster names in the output markers file don't seem to match those in the 'Louvain Cluster' input column. The input 'Louvain Cluster' column contains names like 1, 2, 3, etc. whereas my output markers file now has names like '0_B cell'.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/128?email_source=notifications&email_token=AACL4TOFBHOTDJ67Z3XNJV3QH7Q2FA5CNFSM4ILOUIQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD54G6QQ#issuecomment-527986498, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TOOGIRDARDQL7WMNKLQH7Q2FANCNFSM4ILOUIQA .
I think it's called 'cluster_names'
I can't reproduce this. In hive/data/inside/cells/datasets/tabula-muris-senis/facs/bat, I've run cbScanpy like this:
cbScanpy -e exprMatrix.tsv.gz -m meta.tsv -o tms-bat-facs_marker-recalc -n tms-bat-facs --inCluster 'Louvain Cluster' --copyMatrix
(note that you really need --copyMatrix otherwise the matrix is really small now, as it will only copy the highly variable genes)
the resulting markers.tsv has the numbers of the "Louvain Cluster" field, as expected.
On Wed, Sep 4, 2019 at 10:03 PM Matt Speir notifications@github.com wrote:
I think it's called 'cluster_names'
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
The output looks like this:
cluster_name gene z_score 0 Cd79a 80.506996 0 H2-DMb2 41.33267
When you say "it didn't happen for the other datasets...", did you use the 'Louvain Cluster' field as the cluster field? Looking at the FACS bladder dataset, it looks like you used the 'free_annotation' field rather than the 'Louvain Cluster' field. I thought we were going with the Louvain Cluster field for the label, not cell type. Oh, right, we've changed the fields. Sorry, so just to confirm that the new version does work as espected, I now ran this:
cbScanpy -e exprMatrix.tsv.gz -m meta.tsv -o tms-bat-facs_marker-recalc -n tms-bat-facs --inCluster 'cluster_names' --copyMatrix
And the resulting markers.tsv does have the new corrected cluster names in the first column:
cluster_name gene z_score 0_B cell Cd79a 80.506996 0_B cell H2-DMb2 41.33267 0_B cell Cd79b 28.653046 0_B cell Faim3 27.186304
So all seems to work fine. There were a few cellbrowser.conf files with the wrong fields in facs/, I've fixed them up now.
Do you want to run a for loop over all directories now to calculate the markers or shall I do it?
It should be a command like this:
for i in find -type d | cut -c3-
; do cd $i; cbScanpy -e
exprMatrix.tsv.gz -m meta.tsv -o recalc -n --inCluster
'cluster_names' --copyMatrix; cbAnnotateMarkers recalc/markers.tsv
recalc/markers_annot.tsv; cd ..; done
You can then fix up the marker file pointers:
sed -i 's|markers.annotated.tsv|recalc/markers_annotated.tsv|' */cellbrowser.conf
(Actually I just did this, so no need to do this anymore for the facs datasets)
And rebuild all the cell browsers:
cbBuild -r
On Thu, Sep 5, 2019 at 2:04 PM Maximilian Haeussler maximilianh@gmail.com wrote:
I can't reproduce this. In hive/data/inside/cells/datasets/tabula-muris-senis/facs/bat, I've run cbScanpy like this:
cbScanpy -e exprMatrix.tsv.gz -m meta.tsv -o tms-bat-facs_marker-recalc -n tms-bat-facs --inCluster 'Louvain Cluster' --copyMatrix
(note that you really need --copyMatrix otherwise the matrix is really small now, as it will only copy the highly variable genes)
the resulting markers.tsv has the numbers of the "Louvain Cluster" field, as expected.
On Wed, Sep 4, 2019 at 10:03 PM Matt Speir notifications@github.com wrote:
I think it's called 'cluster_names'
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Also, I just noticed: I should probably rename the "clusterField" setting to "defaultColorField", don't you think? clusterField makes little sense.
(I'll stay backwards compatible, so it'll still look for "clusterField" if "defaultColorField" is not found)
On Thu, Sep 5, 2019 at 2:32 PM Maximilian Haeussler maximilianh@gmail.com wrote:
The output looks like this:
cluster_name gene z_score 0 Cd79a 80.506996 0 H2-DMb2 41.33267
When you say "it didn't happen for the other datasets...", did you use the 'Louvain Cluster' field as the cluster field? Looking at the FACS bladder dataset, it looks like you used the 'free_annotation' field rather than the 'Louvain Cluster' field. I thought we were going with the Louvain Cluster field for the label, not cell type. Oh, right, we've changed the fields. Sorry, so just to confirm that the new version does work as espected, I now ran this:
cbScanpy -e exprMatrix.tsv.gz -m meta.tsv -o tms-bat-facs_marker-recalc -n tms-bat-facs --inCluster 'cluster_names' --copyMatrix
And the resulting markers.tsv does have the new corrected cluster names in the first column:
cluster_name gene z_score 0_B cell Cd79a 80.506996 0_B cell H2-DMb2 41.33267 0_B cell Cd79b 28.653046 0_B cell Faim3 27.186304
So all seems to work fine. There were a few cellbrowser.conf files with the wrong fields in facs/, I've fixed them up now.
Do you want to run a for loop over all directories now to calculate the markers or shall I do it?
It should be a command like this:
for i in
find -type d | cut -c3-
; do cd $i; cbScanpy -e exprMatrix.tsv.gz -m meta.tsv -o recalc -n --inCluster 'cluster_names' --copyMatrix; cbAnnotateMarkers recalc/markers.tsv recalc/markers_annot.tsv; cd ..; doneYou can then fix up the marker file pointers:
sed -i 's|markers.annotated.tsv|recalc/markers_annotated.tsv|' */cellbrowser.conf
(Actually I just did this, so no need to do this anymore for the facs datasets)
And rebuild all the cell browsers:
cbBuild -r
On Thu, Sep 5, 2019 at 2:04 PM Maximilian Haeussler maximilianh@gmail.com wrote:
I can't reproduce this. In hive/data/inside/cells/datasets/tabula-muris-senis/facs/bat, I've run cbScanpy like this:
cbScanpy -e exprMatrix.tsv.gz -m meta.tsv -o tms-bat-facs_marker-recalc -n tms-bat-facs --inCluster 'Louvain Cluster' --copyMatrix
(note that you really need --copyMatrix otherwise the matrix is really small now, as it will only copy the highly variable genes)
the resulting markers.tsv has the numbers of the "Louvain Cluster" field, as expected.
On Wed, Sep 4, 2019 at 10:03 PM Matt Speir notifications@github.com wrote:
I think it's called 'cluster_names'
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Never mind about running these commands, I've modified them a little to run them in parallel, that should be a lot faster. I'll report back how this went. It may be useful for other projects.
On Thu, Sep 5, 2019 at 2:33 PM Maximilian Haeussler maximilianh@gmail.com wrote:
Also, I just noticed: I should probably rename the "clusterField" setting to "defaultColorField", don't you think? clusterField makes little sense.
(I'll stay backwards compatible, so it'll still look for "clusterField" if "defaultColorField" is not found)
On Thu, Sep 5, 2019 at 2:32 PM Maximilian Haeussler maximilianh@gmail.com wrote:
The output looks like this:
cluster_name gene z_score 0 Cd79a 80.506996 0 H2-DMb2 41.33267
When you say "it didn't happen for the other datasets...", did you use the 'Louvain Cluster' field as the cluster field? Looking at the FACS bladder dataset, it looks like you used the 'free_annotation' field rather than the 'Louvain Cluster' field. I thought we were going with the Louvain Cluster field for the label, not cell type. Oh, right, we've changed the fields. Sorry, so just to confirm that the new version does work as espected, I now ran this:
cbScanpy -e exprMatrix.tsv.gz -m meta.tsv -o tms-bat-facs_marker-recalc -n tms-bat-facs --inCluster 'cluster_names' --copyMatrix
And the resulting markers.tsv does have the new corrected cluster names in the first column:
cluster_name gene z_score 0_B cell Cd79a 80.506996 0_B cell H2-DMb2 41.33267 0_B cell Cd79b 28.653046 0_B cell Faim3 27.186304
So all seems to work fine. There were a few cellbrowser.conf files with the wrong fields in facs/, I've fixed them up now.
Do you want to run a for loop over all directories now to calculate the markers or shall I do it?
It should be a command like this:
for i in
find -type d | cut -c3-
; do cd $i; cbScanpy -e exprMatrix.tsv.gz -m meta.tsv -o recalc -n --inCluster 'cluster_names' --copyMatrix; cbAnnotateMarkers recalc/markers.tsv recalc/markers_annot.tsv; cd ..; doneYou can then fix up the marker file pointers:
sed -i 's|markers.annotated.tsv|recalc/markers_annotated.tsv|' */cellbrowser.conf
(Actually I just did this, so no need to do this anymore for the facs datasets)
And rebuild all the cell browsers:
cbBuild -r
On Thu, Sep 5, 2019 at 2:04 PM Maximilian Haeussler maximilianh@gmail.com wrote:
I can't reproduce this. In hive/data/inside/cells/datasets/tabula-muris-senis/facs/bat, I've run cbScanpy like this:
cbScanpy -e exprMatrix.tsv.gz -m meta.tsv -o tms-bat-facs_marker-recalc -n tms-bat-facs --inCluster 'Louvain Cluster' --copyMatrix
(note that you really need --copyMatrix otherwise the matrix is really small now, as it will only copy the highly variable genes)
the resulting markers.tsv has the numbers of the "Louvain Cluster" field, as expected.
On Wed, Sep 4, 2019 at 10:03 PM Matt Speir notifications@github.com wrote:
I think it's called 'cluster_names'
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Yeah, I think 'defaultColorField' makes more sense now, but I agree that it would nice to keep it backward compatible.
I am accepting defColorField now, just need to update the documentation.
The final commands that I ran were:
for i in find -type d | cut -c3-
; do echo cbScanpy -e
$i/exprMatrix.tsv.gz -m $i/meta.tsv -o $i/recalc -n $i --inCluster
'cluster_names' --copyMatrix; done > commands.txt
parallel --jobs 10 < commands.txt
for i in find -maxdepth 1 -type d | cut -c3-
; do echo cbMarkerAnnotate
$i/recalc/markers.tsv $i/recalc/markers_annot.tsv; done > commands2.txt
parallel --jobs 10 < commands2.txt
I did a lot of marker gene recalculation now and I think we can close this. Until it breaks again. :-)
On Thu, Sep 5, 2019 at 7:50 PM Matt Speir notifications@github.com wrote:
Yeah, I think 'defaultColorField' makes more sense now, but I agree that it would nice to keep it backward compatible.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/128?email_source=notifications&email_token=AACL4TJVRP35EO2WG6R2ZQDQIFBHPA5CNFSM4ILOUIQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6ADNSY#issuecomment-528496331, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TOZ3O2TRU4OKSYJ3ODQIFBHPANCNFSM4ILOUIQA .
When trying to recalculate the markers for a dataset, I get the following error:
Command:
Files:
(Also is it really using the input clusters to recalc the marker genes? I see this in the output which makes me think it's not:
INFO:root:Found 22 louvain clusters
...)