Open adamjtaylor opened 6 months ago
@inodb lets have a quick think about what mapping file setup would be best and think about any backend changes needed to enable this - I am hoping this is simply a join operation between the mapping file and the master JSON
One option would be a mapping file like this
{
"syn1234": ["Target1","Target2"]
"syn53284675": ["DNA", "CD8", "CD45"."CD4", "Ki-67"],
},
I think this seems extensible enough to start with the original as provided target names and switch to harmonized ones in due course.
The following Big Query gets us a table close to what we need:
SELECT
e.entityId,
cm.Channel_Metadata_ID,
STRING_AGG(attribute.attributeValue, ", ") AS channel_names,
FROM
`htan-dcc.ISB_CGC_r5.channel_metadata` cm,
UNNEST(cm.channel_attributes) AS attribute
INNER JOIN
`htan-dcc.released.entities_v5_1` e ON cm.Channel_Metadata_ID = e.channel_metadata_synapseId
WHERE
attribute.attributeName = 'Channel Name'
AND attribute.attributeValue NOT IN ('Red','Green','Blue')
GROUP BY
cm.Channel_Metadata_ID, e.entityId
@inodb I'd like to move forward with discussing how to implement this portal side so I can ensure outputs are prepared correctly.
@adamjtaylor the bigquery table looks good to me! We already have a way to pull from BigQuery directly and store it, so I don't think you need to provide anything else
OK. So I will look to push back a new table to BQ that has entityId
, Channel_Metadata_ID
, and a new column harmonized_channel_names
I'll point you to that once complete
Objective:
Implement a feature on the HTAN portal to display harmonized target names for multiplexed tissue imaging data. This aims to assist researchers in easily locating and identifying datasets with specific antibody markers.
User stories:
Background:
Currently, channel metadata is not easily exposed or searchable by users. Additionally it was not validated at ingestion so is poorly structured. @adamjtaylor is exploring an LLM approach with Lama3 for harmonizing target names that seems promising. To support this work, and provide a MVP solution for users, this issue focuses on creating a method to display these names effectively on the portal.
For the MVP:
Looking Ahead:
Eventually, we want to incorporate these target names directly into the dataset metadata. Starting with this simpler display feature will help us lay the groundwork for future enhancements.