Closed kevinschaper closed 2 years ago
The GO annotations are turning out to be more challenging.
functional annotation uses the gene ontology class
mixin to set the subject, which is nicely nested as a class when I'm using https://github.com/linkml/linkml/pull/714
functional association:
...
object:
range: gene ontology class
However, for the subject, the range is defined as molecular activity
macromolecular machine to molecular activity association:
slot_usage:
object:
range: molecular activity
which is a class, and I'm still getting a string in my generated pydantic classes
@dataclass(config=PydanticConfig)
class MacromolecularMachineToMolecularActivityAssociation(MacromolecularMachineToEntityAssociationMixin, FunctionalAssociation):
subject: MacromolecularMachineMixin = Field(None)
predicate: str = Field(None)
object: str = Field(None)
That's controlled by this bit of logic in pydanticgen, which I'm not totally sure about.
if s.range in sv.all_classes():
range_cls = sv.get_class(s.range)
#pyrange = f'"{camelcase(s.range)}"'
if range_cls.class_uri == "linkml:Any":
pyrange = 'Any'
elif s.inlined or sv.get_identifier_slot(range_cls.name) is None:
pyrange = f'{camelcase(s.range)}'
if sv.get_identifier_slot(range_cls.name) is not None and not s.inlined_as_list:
#collection_type = sv.get_identifier_slot(range_cls.name).range
collection_type = 'str'
else:
pyrange = 'str'
In this case, the molecular activity range is not inlined, and does have an identifier slot, the range is a string, rather than the class name. It seems like s.inlined
here is reversed (inlined means use a string, I think?), but also, this seems to be shooting for always using a string when the range is a class with an ID, and I think we don't want that?
@kevinschaper Please review where linkml and biolink PRs are for this.
It turns out that with the linkml pedanticgen we're going to be just fine with using IDs for subject & object and we won't need to inline them - so I'm going to close this.
The changes for this ticket will go into the biolink model repo, but I want to write the details down here.
At the root level of Association, subject & object are defined as being just strings, but in some of the more specific associations either the subject or object is defined as pointing to a class or mixin. For example, right now GeneToDiseaseAssociation has the subject defined as a GeneOrGeneProduct, but the object is still defined as a string.
When we move to linkml pydanticgen, we're not going to be able to use strings as the subject or object when the slot is defined with a class, which means that right now creating GeneToDiseaseAssociation will look like:
Ideally, as we move to this format, we should update the biolink model to use classes on both sides of all of our associations, giving us:
This is the right approach for the model & the code generation, but it's also good for us, because it's an opportunity to validate the subject & object as more than a string.
These are actually defined properly using a mixing to set the subject, but pydanticgen isn't setting the slot correctly