hdmf-dev / hdmf

The Hierarchical Data Modeling Framework
http://hdmf.readthedocs.io
Other
46 stars 26 forks source link

[Feature]: Support annotation of substrings with HERD or another system #1092

Open rly opened 5 months ago

rly commented 5 months ago

What would you like to see added to HDMF?

Use case 1: HED tags are strings that can contain multiple keys, separated by commas, in any order. A DynamicTable may have a column of HED tags. We want to associate these keys with persistent identifiers in the HED schema, but I'm not 100% sure that is necessary. HED already provides tools for processing the HED tags and linking them to the HED schema. Use case 2: HDMF-ML permits the storage of a PyTorch model output as a long text field. We want to be able to annotate terms within this output with the AI Ontology. A similar hypothetical use case is if a user wants to store text from a scientific paper, device configuration file, or software output in HDMF and associate terms within these strings to external resources.

A single string may not be the ideal representation for these data, but sometimes that is what we have to work with.

In use case 1, the key can be anywhere in any string in the one-dimensional VectorData. In use case 2, we want to annotate a particular substring of a scalar text field, since the same substring may appear multiple times with different meanings (rare), so it would be important to store the starting index of the substring. These probably require different solutions.

It may also be useful to have a way to refer to substrings in general for annotation, like DynamicTableRegion for row slicing of tables and TimeIntervals for annotating time series in time.

I'm open to ideas. Just wanted to start a discussion.

What solution would you like?

^

Do you have any interest in helping implement the feature?

Yes.

mavaylon1 commented 3 months ago

Focusing on case 2, what do is mean to store a pytorch model output as a long textfield? If I had a model that does semantic segmentation and I predicted a segmented image. The matrix is stored as a string?

VisLab commented 1 month ago

@rly with the release of HED version 8.3.0, HED now has persistent identifiers for each HED tag (and auxiliary items such as unit classes etc.). HED now has an associated Ontology (see https://bioportal.bioontology.org/ontologies/HED).

Is there any more documentation on the roadmap for HERD and the needed support?

mavaylon1 commented 1 month ago

@VisLab Hi there. As the main developer of HERD, the next planned stage is a continuation of user facing tools to more easily automate term validation and HERD population when writing the file.

We do have some ideas that have not been formalized in a community facing roadmap that are beyond user facing tools.

That being said, the team and I are more than happy to discuss expanding HERD. I can talk with the team next week, and then get back to you.