x-atlas-consortia / hra-pop

HRApop
MIT License
0 stars 1 forks source link

Prepare data for validation #100

Open andreasbueckle opened 1 month ago

andreasbueckle commented 1 month ago

Goal

Create scatter graph where each dot is a dataset out of 553:

image

Plot as 2D graph with RUI correctness on y-axis and CTann correctness on x-axis. Each dot is one of the 553 Atlas datasets for which we do have “gold standard” data but we pretend not to have it.

RUI-based spatial correctness on y-axis is computed via %containment and then weighted cosine between original %AS to predicted %AS and nowhereland (empty space in registration, sticking out). If predicted = origins then the result is 1.

CTann correctness on x-axis is computed via weighted cosine between original CTann to predicted CTann. If predicted = origins then the result is 1.

For each of the 553, we we get both values (each between 0 and 1).

Legend:

Data products needed

Need a long dataset with columns:

dataset ID CTann sim RUI sim organ tool sex
https://entity.api.hubmapconsortium.org/ancestors/d6e6c8e452ed628425e9e928306a6db0 0.78 0.98 heart azimuth male

Steps

To compute CTann sim:

To compute RUI sim:

Get tool, organ, sex from https://lod.humanatlas.io/graph/hra-pop/v0.10.2/assets/atlas-enriched-dataset-graph.jsonld.

andreasbueckle commented 1 month ago

Use https://github.com/hubmapconsortium/hra-glb-mesh-collisions to get distances between corridors?

andreasbueckle commented 1 month ago

With @bherr2, let's write a query to get:

Later:

How to define most similar RUI location: Given the CTann of the input dataset, which of the 282 atlas RUI locations has the highest cosine sim when comparing its CTann to the one from the input dataset?

How to define most similar dataset: Given the CTann of the input dataset, which of the 553 atlas dataset (if sex, organ, tool are the same) has the highest cosine sim when comparing its CTann to the one from the input dataset?

andreasbueckle commented 1 month ago

@bherr2 Let's place this into reports/atlas/validation-v7-ctann-rui

andreasbueckle commented 1 month ago

@andreasbueckle Move all corridor GLBs into 1 scene, then export as 1 GLB Then associate sceneNodes with rui location id (make a look-up?)

To keep name: load with Blender API, then rename sceneNode with filename (=rui location id?). May need prefix so sceneNode does not start with number

andreasbueckle commented 1 month ago

@bherr2

image M and F are right on top of each other which will mess with the collisions Although we can probably filter that out in post (ie once we know if the rui is m/f, filter out irrelevant collisions)

I'm gonna load into blender and save as GLTF (json) to see how the scene is composed. May be a problem there. Just the JSON part

{
    "asset":{
        "generator":"Khronos glTF Blender I/O v3.6.27",
        "version":"2.0"
    },
    "scene":0,
    "scenes":[
        {
            "name":"Scene",
            "nodes":[
                491
            ]
        }
    ],
    "nodes":[
        {
            "mesh":0,
            "name":"00087766-0287-467c-9060-b52773db3dce.glb"
        },
        {
            "mesh":1,
            "name":"0016badc-9917-402c-b950-257d77c50b3d.glb"
        },
        {
            "mesh":2,
            "name":"007eb4d9-1694-4380-99e1-4aba832d9227.glb"
        },
        {
            "mesh":3,
            "name":"00f945be-8604-4382-834d-707a37498a9a.glb"
        },
        {
            "mesh":4,
            "name":"016e1d91-9c07-46b7-8441-2975df328fb3.glb"
        },
        {
            "mesh":5,
            "name":"026751c5-ef86-4f35-a810-5f8adc2887a5.glb"
        },
        {

Looks like you need to change the mesh name, not just the scene node name image

Also you should probably strip off the .glb in the name image

andreasbueckle commented 1 month ago

@bherr2 shares new report with:

bherr2 commented 4 weeks ago

X axis: https://github.com/x-atlas-consortia/hra-pop/blob/main/output-data/v0.10.3/reports/atlas/validation-v7-x-axis.csv Y axis: https://github.com/x-atlas-consortia/hra-pop/blob/main/output-data/v0.10.3/reports/atlas/validation-v7-y-axis.csv

andreasbueckle commented 3 weeks ago

🚧 @andreasbueckle uses https://github.com/x-atlas-consortia/hra-pop/blob/main/output-data/v0.10.3/reports/atlas/validation-v7-x-axis.csv for x-axis 🚧 @andreasbueckle computes %AS tag similarity between rui_location and predicted_rui in https://github.com/x-atlas-consortia/hra-pop/blob/main/output-data/v0.10.3/reports/atlas/validation-v7-y-axis.csv 🚧 @andreasbueckle draws both as scatter graph, color by organ, facet by tool (and sex?)

andreasbueckle commented 3 weeks ago

Update 8/20/24:

For containment: AGREED ON: Step 1: compute % of orig loc TB per all AS and 1 nowhereland. Step 2: compute % of predicted loc TB/corridor per all AS and 1 nowhereland. Step 3: use % in weighted cosine for original vs. predicted vector

andreasbueckle commented 3 weeks ago

Update 8/22/24: Cannot use nowhereland, because it implies similarity between extraction sites where they might be none, e.g., rui 1 sticks out of kidney and rui 2 sticks out of heart

Instead, let's create a report to capture nowhereland: https://github.com/x-atlas-consortia/hra-pop/issues/105

andreasbueckle commented 2 weeks ago

I committed an updated notebook with 2 scattergraphs, one for each RUI sim measurement: https://github.com/cns-iu/hra-cell-type-populations-supporting-information/blob/main/validations/rui_ctann/rui_ctann_validation.ipynb