hubmapconsortium / hra-workflows-runner

hra-workflows-runner: Pipeline to download, annotate, and summarize experimental data using hra-workflows
MIT License
0 stars 0 forks source link

popV crosswalk for heart lists hepatocyte (liver) #37

Open andreasbueckle opened 1 month ago

andreasbueckle commented 1 month ago

See https://github.com/hubmapconsortium/hra-workflows-runner/blob/main/crosswalking-tables/popv.csv#L88

See: Organ_ID Annotation_Label Annotation_Label_ID CL_Label CL_ID CL_Match
heart UBERON:0000948 hepatocyte PV:0000077 hepatocyte CL:0000182 skos:exactMatch

This has ramifications for these two datasets from the same team:

axdanbol commented 1 month ago

I did a quick investigation and discovered that hepatocyte is present in some of the popv algorithm's heart models. So this seems to be a problem with popv itself rather than our processing.

emquardokus commented 1 month ago

Working with Supriya on 2DFTU manuscript, I asked to see which organs had cells in common with each other. For the most part the common cells were typically immune cells (B cells, T cels etc), connective tissue cells (fibroblasts) and in the case where nested FTUs exist nephron<--glomerulus, nephron<--tubules (this includes all the parts of the tubules that we have as separate 2D FTUs). I also then noticed this outlier of hepatocyte in heart and liver, which was WRONG. Tracking it back, I found it in the popv crosswalk file, but that was derived from a list provided from Bruce which I'm not sure where he obtained his list. I will upload a new popv crosswalk to revise/fix this mistake. Reviewing this file filtered-CTs-with-datasets-with-organ.csv I saw "hepatocyte associated with heart" which is clearly incorrect, but was able to determine from this file it was enriched from popV crosswalk. This crosswalk was published in the 7th release in HRA-KG-->digital objects--> ctann as well as being used in the hra-workflows-runner github repo for CTAnn/HRApop work.

andreasbueckle commented 5 days ago

Now also captured in https://github.com/x-atlas-consortia/hra-pop/issues/106

emquardokus commented 3 days ago

The main reason we can NOT have a table that lists "Need list of cell types that are not supposed to be in organ model" because some cell types like various immune cells will always appear in all organs---different numbers or specific immune cells. The hepatocyte example is the only one we found because it's such an obvious mistake---hepatocytes should only be in liver not heart or lung. The other possible issue is if a sample block had contaminating tissue from surrounding organs---example: liver is in close proximity to both heart and lung. In this case, one would expect that the number of these contaminating cell types in a dataset should fall below the threshold one would expect of cells that natively exist in the primary tissue. This is one of the preprocessing steps for single cell RNA sequencing analysis--- remove cells that are 3 or below. This threshold can be modified during the analysis.