maximilianh / cellBrowser

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org
https://github.com/ucscGenomeBrowser/cellBrowser/
GNU General Public License v3.0
104 stars 41 forks source link

public instance - allen mouse whole cortex - mismatch between cell expression profiles and their metadata #228

Closed dvera closed 3 years ago

dvera commented 3 years ago

All Datasets > Allen Brain Map: Cell Types Database > 10xWhole Cortex & Hippocampus (2020)

This dataset does not have the correct pairing of the expression matrix and metadata in the public browser (cells.ucsc.edu). I had the same issues when loading this data in seurat. I have not tracked the exact cause of the mismatch, but it could be related to a failure in CreateSeuratObject's pairing of the expression matrix to metadata based on cell names (assuming you are using Seurat). This dataset is unique relative to other Allen brain map datasets in that the order of the cells in the metadata tsv does not match the order of cells in the expression matrix.

maximilianh commented 3 years ago

Oh. That's bad. Thanks for reporting it! Do you know how we can fix this?

On Thu, Sep 16, 2021 at 4:55 PM Daniel Vera @.***> wrote:

All Datasets > Allen Brain Map: Cell Types Database > 10xWhole Cortex & Hippocampus (2020)

This dataset does not have the correct pairing of the expression matrix and metadata in the public browser (cells.ucsc.edu). I had the same issues when loading this data in seurat. I have not tracked the exact cause of the mismatch, but it could be related to a failure in CreateSeuratObject's pairing of the expression matrix to metadata based on cell names (assuming you are using Seurat). This dataset is unique relative to other Allen brain map datasets in that the order of the cells in the metadata tsv does not match the order of cells in the expression matrix.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/228, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TLZ3ETJ3G2MNBTWMA3UCIAN7ANCNFSM5EE6QEEQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

dvera commented 3 years ago

I sorted both matrices by cell names in R and it loads fine after that. The set of cells in both tables is identical, just not in the same order.

matthewspeir commented 3 years ago

@maximilianh We didn't use Seurat to make this Cell Browser. Doesn't the Cell Browser sort the matrix/meta before it writes them to the html output directory?

dvera commented 3 years ago

If the data is processed with Seurat prior to writing matrices for cb, then seurat will assign the wrong cell names names to object@metadata

maximilianh commented 3 years ago

@mspeir yes our cbBuild script always uses the cellIDs and ignores the order. If we didn't use Seurat objects, then I don't understand how this has happened... hmm...

On Thu, Sep 16, 2021 at 5:33 PM Daniel Vera @.***> wrote:

If the data is processed with Seurat prior to writing matrices for cb, then seurat will assign the wrong cell names names to @.***

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/228#issuecomment-921008608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TLT2TEVXB7X26LW673UCIE4HANCNFSM5EE6QEEQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

matthewspeir commented 3 years ago

I guess I'm not understanding the issue.

You're having issues loading the data from our site into Seurat? Or you're having issues loading the data from the Allen website into Seurat? Or there are cells on our site that have metadata that doesn't match what's on the Allen site because of this sorting issue?

dvera commented 3 years ago

The cell metadata in the public ucsc cb browser does not properly match the cells in the projections. A simple way to visualize this is to examine the expression of Pvalb. Pvalb is enriched in the Pvalb cluster on the allen brain map cell explorer, but not in the public ucsc cb instance. The only reason I mention seurat is because I observed the same mismatch between the expression matrix and metadata when I load the allen brain map matrices (the ones that they serve for public download, but also the matrices served from ucsc public cb instance) using CreateSeuratObject.

dvera commented 3 years ago

note the y axis is flipped between these two. thinking about this a little more this might not be a metadata issue but rather a mismatch between projection and expression matrix.

image

image

maximilianh commented 3 years ago

@mspeir is it possible that you can cbScanpy on the matrix/meta files...? I wonder if scanpy suffers from the same problem...

On Thu, Sep 16, 2021 at 7:39 PM Daniel Vera @.***> wrote:

[image: image] https://user-images.githubusercontent.com/5902410/133659157-78d4793d-a376-40be-b88a-d9cdfac0adcb.png

[image: image] https://user-images.githubusercontent.com/5902410/133659296-d9e2b076-3535-49ca-bdbe-d8b050e9fe09.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/228#issuecomment-921100683, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TPBFND54QE56HO4GM3UCITWBANCNFSM5EE6QEEQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

matthewspeir commented 3 years ago

Ahh, okay. Now I see what you're talking about.

Hmm, looks like I loaded their matrix and a trimmed version of their metadata into anndata, saved it as an h5ad, and then used cbScanpy to export the transposed matrix and calculate markers. Yeah, anndata will have the same issue if the input files aren't sorted in the same way. I had forgotten I had done something funky to transpose the matrix and recalculate markers.

I could probably transpose the matrix using some other method, sort it, and then it should be fine. Probably would need to redo the markers as well.

maximilianh commented 3 years ago

Hey @dvera, do you happen to have the expression matrix of this project in a Seurat object?

dvera commented 3 years ago

I do. I'll PM a URL

matthewspeir commented 3 years ago

Hey @dvera.

Thanks for sharing those files. Super helpful. I've updated our public instance with your fixed matrix: https://cells.ucsc.edu/?ds=allen-celltypes+mouse-cortex+mouse-cortex-2020&gene=Pvalb.

Feel free to re-open if you think this still isn't resolved.

maximilianh commented 3 years ago

Great, many thanks @dvera and @matthewspeir !