Closed connorcoley closed 4 years ago
Merging #478 into main will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## main #478 +/- ##
=======================================
Coverage 75.94% 75.94%
=======================================
Files 22 22
Lines 2029 2029
Branches 459 459
=======================================
Hits 1541 1541
Misses 347 347
Partials 141 141
I did think about that. My inclination is that it would be worse to have a Data map for Compounds and a Data map for ReactionProducts. I'm not strictly opposed to having it in both places. If we do that, then we're basically using Compound product
solely for its repeated CompoundIdentifier identifiers
and amount
fields?
Re: compounds, a separate idea is that we could potentially remove Features from Compounds now since those could be considered a subset of Data.
Is product-level data not already in multiple places? The ReactionProduct
has Percentage yield
, its Compound
has amount
and identifiers
, and the outcomes.analyses
has processed_data
.
Think I added this on #466, but I think it would be nice to have Data
as part of ReactionProduct
. Not sure what that should do to the other messages.
This raises some interesting points. What's your reaction to the following?
Compound
and ReactionProduct
features
from Compound
Compound
from ReactionProduct
and replace it with repeated CompoundIdentifier
and amount
Essentially this makes Compound
and ReactionProduct
counterparts for inputs and outputs, respectively.
Remove Compound from ReactionProduct and replace it with repeated CompoundIdentifier and amount
Removing the Compound from ReactionProduct will break backwards compatibility and require us to revise all of the current examples and datasets. It will also make require some substantial changes to the editor in terms of how the Compound fieldset is reused in multiple places. I'm hesitant to make that change.
Removing the Compound from ReactionProduct will break backwards compatibility and require us to revise all of the current examples and datasets. It will also make require some substantial changes to the editor in terms of how the Compound fieldset is reused in multiple places. I'm hesitant to make that change.
Understood. It would be a pain in the short-term. But now is the time to make this change if we think it would dramatically improve usability of the schema. If the long-term benefit seems modest, we can skip it.
As I think more about this specific question, I'm leaning toward not trying to rework the usage of Compound
. I'm still in favor of adding a Data
map to ReactionProduct
so that product-level info is grouped at the same level.
Still, let's keep in mind that alpha testing is likely our last chance to make larger-scale, disruptive changes to the schema if we find compelling reasons to do so based on user feedback.
As I think more about this specific question, I'm leaning toward not trying to rework the usage of
Compound
. I'm still in favor of adding aData
map toReactionProduct
so that product-level info is grouped at the same level.Still, let's keep in mind that alpha testing is likely our last chance to make larger-scale, disruptive changes to the schema if we find compelling reasons to do so based on user feedback.
I'll add that field to this PR.
And yes, point taken. @michaelmaser any opinion about the idea of replacing Compound compound
in the ReactionProduct message with separate constituent fields identifiers
and amount
?
And yes, point taken. @michaelmaser any opinion about the idea of replacing
Compound compound
in the ReactionProduct message with separate constituent fieldsidentifiers
andamount
?
Yes, I like this. I agree would avoid re-working all of Compound
usage, which works well for the rest of reaction components.
I have been thinking about and am still wondering if/what sort of specific data fields should be included in ReactionProduct
based on how they will be accessed downstream. Right now there is Percentage compound_yield
, Percentage purity
, and Selectivity selectivity
. I like the addition of identifiers
and amount
.
If arbitrary Data
maps are added, they would be directly linked to the product species, but the data would still have to be called by assigned name, right? I'm wondering if there is value in defining any structured/specific data fields within the product message to hopefully enforce consistency between submissions, like a peak_area_count
message with enumerated methods, for example. Thinking mostly of HTE examples that commonly seem to report peak areas only...so if these are only recorded in arbitrary analyses
, then there's no quantitative data linked to the product species. Thoughts? Could be a separate issue.
Aside; looking at the ReactionProduct
message, I am realizing Selectivity selectivity
might possibly need to be a repeated field...since a product could, e.g., be formed both enantio- and diastereoslectively. Thoughts?
@michaelmaser Just to clarify, I think that if we avoid reworking Compound
we will keep identifiers
and amount
in Compound
rather than adding them to ReactionProduct
. Does that make sense?
I like the idea of adding some additional specific fields to ReactionProduct
to encourage consistency across submissions.
Resolves #466
In addition to allowing arbitrary data to be defined for products (as suggested in issue #466), I thought it would be useful to also allow arbitrary data for compounds. I could imagine attaching the results of some QC to a starting material. For me, the most natural place to add a
map<string, Data>
field is to the Compound message so it can do double duty for starting materials and product compounds.