brainSlice schemas - Githubissues

openMetadataInitiative / openMINDS_SANDS

A metadata model capturing Spatial Anchoring of Neuroscience Data Structures, including brain atlas definitions.

MIT License

12 stars 11 forks source link

brainSlice schemas #81

Closed blixilla closed 3 years ago

blixilla commented 4 years ago

Thanks for your input in the meeting Yesterday. As we talked about, I am now proposing a starting point for two new schemas to accommodate our 2D image series that may or may not be included in the next release of SANDS. However I'll work on this use-case and bring up a hopefully more thorough suggestion in the time to come.

sliceSeries.schema.json 1.1) @ type [string, constant: sliceSeries.schema.json, 1-1] 1.2) @ id [string, free text, 1-1] 1.3) brainSlice(s) [object, type: brainSlice, 2-n] 1.4) identifier [string, free text, 1-1]

brainSlice.schema.json 2.1) @ type [string, constant: brainSlice.schema.json, 1-1] 2.2) @ id [string, free text, 1-1] 2.3) image [object, type: image.schema.json] 2.4) definedIn [object, type: File, 0-n] 2.5) identifier [string, free text, 1-1] 2.6) derivedData [object, type: File, 0-n] 2.7) positionInAtlas [object, type: coordinatePoint.schema.json, 0-n]

We talked about histologicalSlice, but I changed it to brainSlice for now as we have cases of 2D brain section series without the use of histological techniques. Perhaps we could switch slice with section as well, as this is what we normally call this piece of brain material, at least here in Oslo?

What do you think @lzehl, @skoehnen, @UlrikeS91, @aeidi89, @GMattheisen?
as for the electrode schema discussion, we can make direct references by using the numbers in the lists :)

Majpuc commented 4 years ago

Hi Camilla, Very good. I have a question: Are metadata specific to the slice series and the individual slices going to be captured by SANDS or openMInds or both? Specifically, I think about sampling frequency; section thickness, bregma level, naming convention,... Best, Maja

xgui3783 commented 4 years ago

Just a few comments (I wasn't privy to the discussion, so it might have been discussed, and for that I apologise in advance):

brainSlice.schema.json: I see a great parallel (but not a complete fit) to image.schema.json, perhaps a super class? Indeed image is a linked property.
positionInAtlas property have me a bit puzzled:
- if no other transformation information, (e.g. coronal plane, or non linear transformation field ) is provided, it would not be very useful to place the brain slice in the atlas
- image object, in principle, should have a transformation activity registered to an atlas, which should provide the necessary information to place it in an atlas?

I am also unclear what information definedIn and derivedData should contain.

apdavison commented 4 years ago

Sorry if I'm repeating something that was already said in the meeting, but we have Tier 3 schemas for Slice and BrainSlicingActivity (see https://github.com/HumanBrainProject/fairgraph/blob/master/fairgraph/electrophysiology.py)

I'm happy to adjust these to fit with the needs of SANDS; my main concern is to keep the separation of Entity (Slice) and Activity (BrainSlicing) so we maintain compatibility with the PROV data model.

UlrikeS91 commented 4 years ago

It's a good start, but I'm also missing some pieces. As @Majpuc suggested, we should capture specific things to describe brain sections in SANDS, while the protocol(s) describing how the sampling has been performed should be part of openMINDS.

For a brain section (or whatever we may want to call this later), I would at least include the section thickness, the laterality (do I see a full brain slice or only half? Which half then?), an anatomical orientation could be of interest (coronal, sagittal, horizontal, custom, etc.). Of course, it should also have image, but then definedIn is not necessary given that a brain section has to be an image file (right?). And image has definedIn, which should be enough. I'm also wondering if an identifier is needed here? And if so, for what? @blixilla Could you explain what derivedData is intended for (basically the same question as @xgui3783 raised as well)? I think positionInAtlas (if we ignore that the name might need to be changed) should probably rather point to a new spatial object that describes the position of the image in space. This probably needs to be different for 2D image vs. 3D image?

For section series, an identifier would be nice given that a dataset could contain more than one series. Pointing to brainSlices is of course necessary. In relation to Maja's comment: Would it be useful to define by what pattern the series is organised (by numbers, alphabetical, etc.)? I'm wondering if it would be useful to point to the subject (in openMINDS) that the brain section series is related to? For example, I could have used one subject following a specific protocol (execution) to produce the brain sections. Then I used one half of the sections to stain for PV and the other half for Nissl. This would give me 2 different series, but both from the same brain. In general, this could be covered by the provenance and the relation between files and subjects in openMINDS, but this would also mean that for datasets with brain section series, file bundles for each of these cases need to be created. This may not always be the case, though, correct?

Additionally, we could include a field to brain section series pointing to openMINDS protocol (execution) that would capture the information of what happened with the brain section series. This could cover BrainSlicingActivity as requested from @apdavison, but I not sure if this would be the best solution. openMINDS would cover the process of slicing brains without SANDS. While SANDS would tell you which sections from the scliced brain belong to one series. Does that make sense?

blixilla commented 4 years ago

Thank you for all your questions, suggestions and prompt feedback. I have tried to summarize from my perspective:

sliceSeries.schema.json assumptions:

is a series of brain sections from one brain
could point to brainSlice to indicate how brain sections relate and belong to the same series
could have an identifier as a dataset can contain more than one series

@ type [string, constant: sliceSeries.schema.json, 1-1] @ id [string, free text, 1-1] brainSlice(s) [object, type: brainSlice, 2-n] identifier [string, free text, 1-1]

brainSlice.schema.json assumptions:

represents one physical brain section
could point to image which again would be definedIn in the image file of the brain section @xgui3783 @UlrikeS91
could have sectionThickness to aid visualization in 3D
could have derivedData (or perhaps derivedStructures) to indicate files containing e.g. segmentations of cells from the brain section (output from the QUINT workflow) @xgui3783 @UlrikeS91
could have sectionNumber to indicate relative distance to other brain sections (this often resides in the file name, though)
we need more information to represent the slice in space

@ type [string, constant: brainSlice.schema.json, 1-1] @ id [string, free text, 1-1] image [object, type: image.schema.json] sectionThickness [string, pattern: integer unit, 0-1] sectionNumber [integer, 0-1] derivedStructures [object, type: File, 0-n] more?

Then, some questions:

Is it relevant to point sliceSeries to a subject in openMINDS? or are they already related via the actual Files?
Would sliceSeries pattern, brainSlice section number, laterality and anatomical orientation actually be necessary to anchor an image in space? Or more 'nice to have'?
How does transformation activity relate to image? Where would we insert and define the output of the transformation?
Would we want to define an image's position in space, directly in the image schema? (for QuickNII output this would be 3x3 vector coordinates for the corner of an image) @xgui3783
For SANDS, we would need to capture some specific metadata (e.g. section thickness) to be able to spatially anchor and visualize it. How would you then think brainSlice in SANDS and Slice for Tier 3 would overlap? @apdavison @Majpuc

xgui3783 commented 4 years ago

@blixilla re: brainSlice.schema.json most of your revision makes sense to me. Thanks for the clarification!

re: sectionThickness, I wondered if we could made adjustments to image.schema.json#voxelSize and allow voxelSize of z to be defined as the thickness of a 2D image? (This is a point which I wanted to raise with @UlrikeS91 @lzehl and @skoehnen during the meeting we had, but I wanted to investigate further)

In my mind, the advantage is that, for a 2D image, the z voxelSize can always be set to 0, if the thickness is unknown/it doesn't make sense in the context of the image, but it can be set to a non zero value, should the thickness of the image in its context is known.

there are a few issues that I can see with my proposed approach:

thickness makes much more sense as a property of a brainslice, not an image
could there be a scenario where voxel resolution of an image is unknown, but physical thickness is known? This may create problems when trying to define a unit for the coordinateSpace for the said image

If I have to choose right now, I think your approach (brainSlice.schema.json with sectionThickness -> image.schema.json with voxelSize.z === 0) is a more sensible approach.

Majpuc commented 4 years ago

@xgui3783 The challenge with section thickness is that the value that is known is the one chosen for cutting the brain tissue e.g. 50microm. But since brain tissue usually shrinks when stained and/or mounted onto a microscopic slide, the value differs a bit. Then, imaging of that sections is sometimes capturing the surface only or could represent a certain depth of tissue (i.e. 10-20 micrometers) and flattening, depending on the imaging method...So the depth of a 2D image is...I am not sure how to assign a voxel size.. We could decide on an internal convention... I would ask advise from Gergely and Dmitri. But section thickness is a metadata of the original tissue section, or?

UlrikeS91 commented 4 years ago

@xgui3783, I agree with your own conclusion. I don't think it would be good to include the slice thickness as the 3rd dimension on the voxelSize of an image. I think that there is a big difference between the depth of the content of an image, and the depth of the image itself. Even though, brain section usually have a thickness (and therefore are 3D), the image I take from the section is not a volume. If we were to allow z of the voxelSize to represent the thickness of the content, this would cause confusions to when an image is in fact a 3D image and when we only give additional information about the content of the 2D image.

I would like to keep the image.schema.json as generic as possible, so that it applies to any that is an image.

@blixilla, Thank you for explaining what you meant by derivedData/derivedStructure. This makes the purpose of the proposed property much clearer :) It absolutely makes sense to capture this provenance. I'm just wondering to which degree we should do this and where/how exactly. I can see a few option which could be discussed :)

blixilla commented 4 years ago

Thanks for the comments, @xgui3783, @Majpuc, @UlrikeS91 and @apdavison! I'll try to take these in consideration and come up with a new suggestion for us to discuss further. Let me know if you suddenly think of any solutions in the meantime :)

lzehl commented 4 years ago

Dear all, I want to come back to what @apdavison wrote: that there are already two schemas available in Tier 3 (extracted from https://github.com/HumanBrainProject/fairgraph/blob/master/fairgraph/electrophysiology.py):

Slice

name (expects: free text)
subject/wasDerivedFrom (expects: Subject [tier 3 schema])
providerID (expects: free text)
brain_slicing_activity (expects: electrophysiology.BrainSlicingActivity [tier 3 schema])
activity (expects: electrophysiology.PatchClampActivity OR optophysiology.TwoPhotonImaging [tier 3 schemas])

BrainSlicingActivity

subject/used (expects: Subject [tier 3 schema])
slices/generated (expects: Slice [tier 3 schema])
brain_location (expects: BrainRegion [tier 3 schema])
hemisphere (expects: controlledTerm [left, right])
slicing_plane (expects: controlledTerm [sagittal, para-sagittal, coronal, horizontal])
slicing_angle (expects: float)
cutting_solution (expects: ??)
cutting_thickness (expects: QuantitativeValueRange OR QuantitativeValue [tier 3 schema])
start_time (expects: datetime)
end_time (expects: datetime)
people/wasAssociatedWith (expects: Person [tier 3 schema])

In parallel I realized that we also have the schemas for tissueSample and issueSampleState in openMINDS (both still need to be defined fully and to be discussed). Current assumption:

tissueSample

alias (expects: free text, count: 1)
organOfOrigin (expects: controlledTerm, count: 1)
pathology (expects: controlledTerm, count: 1) [e.g. healthy]
quantity (expects: integer [default: 1], count: 1)
stateStudied (expects: tissueSampleState, count: 1 - N)
type (expects: controlledTerm, count: 1 - N) [e.g. slice, sliceSeries, whole brain]
wasUsedIn (expects: datasetVersion.schema.json, count: 1 - N)

tissueSampleState

quality (expects: free text, count: 0 - 1)
weight (expects: number, count: 0 - 1)
wasUsedFor (expects: protocolExecution.schema.json, count: 1 - N)

With all this (and I'm sorry that I did not think of this before), I'm wondering if we need to define additionally brainSlice and brainSliceSeries schemas or if they should not emerge from openMINDS (tissueSample and protocol/protocolExecution) + tier-3 specifications if they exist.

@apdavison & @blixilla what do you think?

apdavison commented 4 years ago

Dear @lzehl, thanks for bringing this up. As you know, I've started thinking about how to make the Tier 3 schemas articulate better with the openMINDS and SANDS schemas, and had the idea of openMINDS "extensions", like openMINDS/electrophysiology, for example, for Tier 3 metadata.

This case is an example of where we can't just add new schemas, but need to add additional fields to existing ones. I think the best way to handle this is through the ideas of inheritance/polymorphism from object-oriented programming. In other words, sands/brainSlice would inherit from openMINDS/tissueSample (in JSON-LD terms, it would have something like @type = ["openMINDS:tissueSample", "sands:brainSlice"]), and then ephys/brainSlice would inherit from sands/brainSlice. Perhaps the cleanest way to handle this would be to replace the parent node in the graph with the relevant child node.

I think this question needs an issue of its own, and input from Oli Schmid and Tom Gillespie will be needed.

lzehl commented 4 years ago

@apdavison this is a very good idea and point. I've started talking with @olinux about a similar approach for the research product schemas. We should continue this discussion, but push for the release of the next versions of SANDS and openMINDS regardless now. We should only intervene if the current model of the schemas or the schemas would cause major conflicts with this approach.

@blixilla for the brainSlices : I'd like to push this to after the release for now. But we should continue the discussion, in particular with @apdavison and @olinux and @tbugs

lzehl commented 3 years ago

I'll close this issue for now. We can reopen it later if necessary