andreasbueckle commented 1 month ago

Goal

Create scatter graph where each dot is a dataset out of 553:

Plot as 2D graph with RUI correctness on y-axis and CTann correctness on x-axis. Each dot is one of the 553 Atlas datasets for which we do have “gold standard” data but we pretend not to have it.

RUI-based spatial correctness on y-axis is computed via %containment and then weighted cosine between original %AS to predicted %AS and nowhereland (empty space in registration, sticking out). If predicted = origins then the result is 1.

CTann correctness on x-axis is computed via weighted cosine between original CTann to predicted CTann. If predicted = origins then the result is 1.

For each of the 553, we we get both values (each between 0 and 1).

Legend:

Color 553 dots by organ.
Size code dots by #Datasets used to make the CTann prediction (differs by AS)
Add labels to the best RUI and CTann prediction dots, worst CTann prediction dot, and worst RUI prediction dot.

Data products needed

Need a long dataset with columns:

dataset ID	CTann sim	RUI sim	organ	tool	sex
https://entity.api.hubmapconsortium.org/ancestors/d6e6c8e452ed628425e9e928306a6db0	0.78	0.98	heart	azimuth	male

Steps

To compute CTann sim:

Get true CTann via https://lod.humanatlas.io/graph/hra-pop/v0.10.2/assets/atlas-enriched-dataset-graph.jsonld
Get predicted CTann via https://apps.humanatlas.io/api/#post-/hra-pop/rui-location-cell-summary to get a cell summary for the extraction site of the dataset (see exemplary usage of the API endpoint in https://github.com/x-atlas-consortia/hra-apps/blob/main/applications/us1-spatial-to-cell/main.js)
Calculate weighted cosine sim between the predicted CTann (API response) and true CTann

To compute RUI sim:

get true RUI from https://lod.humanatlas.io/graph/hra-pop/v0.10.2/assets/atlas-enriched-dataset-graph.jsonld
Get predicted RUI by using https://apps.humanatlas.io/api/#post-/hra-pop/cell-summary-report
Potentially: use https://github.com/hubmapconsortium/hra-tissue-block-annotation to get mesh-based AS tags for predicted RUI
Get mesh-based AS tags for true RUI from https://lod.humanatlas.io/graph/hra-pop/v0.10.2/assets/atlas-enriched-dataset-graph.jsonld
Calculate weighted cosine sim between original %AS to predicted %AS from RUI location

Get tool, organ, sex from https://lod.humanatlas.io/graph/hra-pop/v0.10.2/assets/atlas-enriched-dataset-graph.jsonld.

andreasbueckle commented 1 month ago

Use https://github.com/hubmapconsortium/hra-glb-mesh-collisions to get distances between corridors?

andreasbueckle commented 1 month ago

With @bherr2, let's write a query to get:

dataset_id
organ_id
organ_label
sex
tool (could also be sc-proteomics)
CTann sim (datasetVsRuiSim)
RUI sim (%AS tags in true RUI vs %AS tags in predicted RUI [take cell summary of the atlas dataset and return a list of RUI locations (use highest cosine sim based on CTann]) (ruiVsTopPredictedDatasetSim)
RUI sim (CTann of RUI location of input dataset vs predicted RUI location) (ruiVsTopPredictedRuiSim)
RUI sim (CTann of input dataset itself vs predicted RUI location) (datasetVsTopPredictedRuiSim)

Later:

RUI sim Euclidean distance between true RUI location of the input dataset and the predicted RUI location, but need to check for containment --NEEDS MORE SPECIFICATION

How to define most similar RUI location: Given the CTann of the input dataset, which of the 282 atlas RUI locations has the highest cosine sim when comparing its CTann to the one from the input dataset?

How to define most similar dataset: Given the CTann of the input dataset, which of the 553 atlas dataset (if sex, organ, tool are the same) has the highest cosine sim when comparing its CTann to the one from the input dataset?

andreasbueckle commented 1 month ago

@bherr2 Let's place this into reports/atlas/validation-v7-ctann-rui

andreasbueckle commented 1 month ago

@andreasbueckle Move all corridor GLBs into 1 scene, then export as 1 GLB Then associate sceneNodes with rui location id (make a look-up?)

To keep name: load with Blender API, then rename sceneNode with filename (=rui location id?). May need prefix so sceneNode does not start with number

andreasbueckle commented 1 month ago

@bherr2

M and F are right on top of each other which will mess with the collisions Although we can probably filter that out in post (ie once we know if the rui is m/f, filter out irrelevant collisions)

I'm gonna load into blender and save as GLTF (json) to see how the scene is composed. May be a problem there. Just the JSON part

{
    "asset":{
        "generator":"Khronos glTF Blender I/O v3.6.27",
        "version":"2.0"
    },
    "scene":0,
    "scenes":[
        {
            "name":"Scene",
            "nodes":[
                491
            ]
        }
    ],
    "nodes":[
        {
            "mesh":0,
            "name":"00087766-0287-467c-9060-b52773db3dce.glb"
        },
        {
            "mesh":1,
            "name":"0016badc-9917-402c-b950-257d77c50b3d.glb"
        },
        {
            "mesh":2,
            "name":"007eb4d9-1694-4380-99e1-4aba832d9227.glb"
        },
        {
            "mesh":3,
            "name":"00f945be-8604-4382-834d-707a37498a9a.glb"
        },
        {
            "mesh":4,
            "name":"016e1d91-9c07-46b7-8441-2975df328fb3.glb"
        },
        {
            "mesh":5,
            "name":"026751c5-ef86-4f35-a810-5f8adc2887a5.glb"
        },
        {

Looks like you need to change the mesh name, not just the scene node name

Also you should probably strip off the .glb in the name

andreasbueckle commented 1 month ago

@bherr2 shares new report with:

rui_location (input dataset) -> gets omitted when returning (PURLs)
dataset (input)
similar_rui_location (output with highest cosine sim) (PURLs)
similarity
tool
sex Then Andi computes %AS tag similarity between input and predicted

bherr2 commented 4 weeks ago

X axis: https://github.com/x-atlas-consortia/hra-pop/blob/main/output-data/v0.10.3/reports/atlas/validation-v7-x-axis.csv Y axis: https://github.com/x-atlas-consortia/hra-pop/blob/main/output-data/v0.10.3/reports/atlas/validation-v7-y-axis.csv

andreasbueckle commented 3 weeks ago

🚧 @andreasbueckle uses https://github.com/x-atlas-consortia/hra-pop/blob/main/output-data/v0.10.3/reports/atlas/validation-v7-x-axis.csv for x-axis 🚧 @andreasbueckle computes %AS tag similarity between rui_location and predicted_rui in https://github.com/x-atlas-consortia/hra-pop/blob/main/output-data/v0.10.3/reports/atlas/validation-v7-y-axis.csv 🚧 @andreasbueckle draws both as scatter graph, color by organ, facet by tool (and sex?)

andreasbueckle commented 3 weeks ago

Update 8/20/24:

It's OK if cosine sim is 1.0.
Add "no man's land" of each RUI location that is not within.
Handle cases where total(as-Intersection) > 1.0

For containment: AGREED ON: Step 1: compute % of orig loc TB per all AS and 1 nowhereland. Step 2: compute % of predicted loc TB/corridor per all AS and 1 nowhereland. Step 3: use % in weighted cosine for original vs. predicted vector

andreasbueckle commented 3 weeks ago

Update 8/22/24: Cannot use nowhereland, because it implies similarity between extraction sites where they might be none, e.g., rui 1 sticks out of kidney and rui 2 sticks out of heart

Instead, let's create a report to capture nowhereland: https://github.com/x-atlas-consortia/hra-pop/issues/105

andreasbueckle commented 2 weeks ago

I committed an updated notebook with 2 scattergraphs, one for each RUI sim measurement: https://github.com/cns-iu/hra-cell-type-populations-supporting-information/blob/main/validations/rui_ctann/rui_ctann_validation.ipynb

x-atlas-consortia / hra-pop

Prepare data for validation #100

Goal

Data products needed

Steps