Closed cmgrote closed 3 years ago
@cmgrote - I thought we had raised an issue about that but I cannot find it.
This would be a nice enhancement. My original thinking was to subclass the InstanceProperties to create a more powerful MatchProperties object. However, I do take your point about how a repository connector would manage to tell the difference and behave sensibly if it does not support the sub-class.
A key consideration for the design of the enhanced API is whether, instead of matchCriteria applying to all property-conditions, there should be separate conditions and combinators allowing more complex conditions to be constructed (e.g. (A or B) and !(C and D). Need to be careful though not to make either the class interface (or grammar) too complex for its own good.
Good point — wondering if we do this in stages and do not initially handle nested / sub-expressions (leave that to if / when we eventually develop a search-specific “language” or for a search index)?
Based on our Tuesday call, I'm moving ahead with a structure like the following:
{
"class": "EntityFindRequest",
"typeGUID": "0db3e6ec-f5ef-4d75-ae38-b7ee6fd6ec0a",
"pageSize": 10,
"matchProperties": {
"conditions": [
{
"property": "displayName",
"operator": "like",
"value": {
"instancePropertyCategory": "PRIMITIVE",
"primitiveDefCategory": "OM_PRIMITIVE_TYPE_STRING",
"primitiveValue": "(?i).*\\Qaddress\\E.*"
}
},
{
"property": "updatedTime",
"operator": ">",
"value": {
"instancePropertyCategory": "PRIMITIVE",
"primitiveDefCategory": "OM_PRIMITIVE_TYPE_DATE",
"primitiveValue": "123456789"
}
},
{
"nestedConditions": {
"conditions": [
{
"property": "...",
"operator": "...",
"value": {
"instancePropertyCategory": "...",
"primitiveDefCategory" : "...",
"primitiveValue" : "..."
}
}
],
"matchCriteria": "ANY"
}
}
],
"matchCriteria": "ALL"
},
"limitResultsByStatus": [ "ACTIVE" ],
"matchClassifications": {
"conditions": [
{
"name": "Confidentiality",
"matchProperties": {
"conditions": [
{
"property": "...",
"operator": "...",
"value": {
"instancePropertyCategory": "...",
"primitiveDefCategory" : "...",
"primitiveValue" : "..."
}
}
],
"matchCriteria": "ALL"
}
}
],
"matchCriteria": "ALL"
}
}
Notes:
nestedConditions
).ALL
on statuses, as a given instance will have only a single status at a time. Therefore I reverted this back to just a list of the statuses that should be used to limit the results.I will implement the following validations of such requests:
nestedConditions
or the property
, value
, operator
combination should be mutually-exclusive to avoid any confusion over precedence (checked recursively given the arbitrary nesting possible)IN_LIST
operator will require an ArrayPropertyValue
value<
, <=
, >=
, and >
operators will require a numeric or a date (primitive) valueLIKE
operator will require a string (primitive) valueInvalidParameterException
on the findEntities
/ findRelationships
method as part of its validationAnother idea:
Am starting to look into implementing these operations in the IGC connector, so would like to push this forward to the 1.7 release as tech preview to be able to start building against.
We've identified this will be important to achieve optimal performance in the Open Lineage Services scanning (and initial load)...
@grahamwallis I believe the base code and implementation for the in-memory repository are now done (as well as an initial implementation on the entity find method in the IGC connector). I think we still need to implement this for the graph repository, though, so adding you as an assignee 🙏
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.
PR #4290 adds findEntities and findRelationships methods to the Graph Repository.
The CTS does not yet cover these find methods, but the above PR has been manually tested with both linear and nested SearchProperties. I also ran a full clean CTS to verify that the addition of the new methods has not regressed any existing function.
There is one operator that cannot yet be supported in the graph repository, namely the 'IN' operator. This is due to a bug in JanusGraph in which the type checking of the value supplied to the 'within' predicate incorrectly checks against the type of the property rather than accepting an array of elements of that type. This happens when the has..within step is nested with a logical step such as OR, AND, etc. This is fatal for SearchProperties in which the hierarchically nested structure of the SearchProperties requires that the graph traversal is constructed from the leaves upward, with each each set of branches being combined using the MatchCriteria specified for that 'level' in the SearchProperties. The result is that it is always the case that a has..within set will be nested. The JG bug is covered by (JG) issue 2188.
PR #2857 includes all the code needed to support the 'IN' operator, including the array property handling and the use of the has..within step has been verified by temporarily coercing the traversal to avoid the issue above. This coercion has since been reverted and the 'IN' operator support has been temporarily replaced by code that throw a FunctionNotSupportedException. When JanusGraph 0.6.0 is available ( no date yet) the appropriate lines of code in GraphOMRSMetadataStore should be uncommented and the throw clauses removed. That should be all that is needed to restore support for the 'IN' operator.
I believe the incorporation of findEntities and findRelationships is now complete based on the above mentioned PRs.
In particular, to allow searching using operators (
=
,>
,<
,!=
,between
,in
, etc).(Thought there was already an issue for this, but couldn't find it @grahamwallis -- do you know?)
From a verbal discussion, leaning towards implementing this as a new set of find methods rather than updating the existing ones (eg. by sub-classing the existing InstanceProperties), as the latter approach would probably break existing repositories that have no way of handling such an extended InstanceProperties.