Closed DavidEnnis-CleverLlamas closed 2 years ago
Thanks @17llamas - approach 1 above seems like a simple and logical default thing to do. We'll get this into the next release.
@17llamas Let me know how this sounds for exposing collections, permissions, and document quality:
We are considering adding an "ml-" prefix to each of these, though we initially won't touch the "meta:" and "property:" prefixes. That would help ensure uniqueness for these FlowFile attributes so that they don't collide with existing attributes.
Some logging (via the LogAttribute
processor) showing all the metadata for some test documents:
-------------------QUERY RESULT-------------------
FlowFile Attribute Map Content
Key: 'filename'
Value: '/PutMarkLogicTest/20.xml'
Key: 'marklogic-collections'
Value: 'QueryMarkLogicTest-2,QueryMarkLogicTest,test1'
Key: 'marklogic-permissions'
Value: 'rest-writer,update,rest-reader,read,rest-reader,execute'
Key: 'marklogic-quality'
Value: '12'
Key: 'meta:meta1'
Value: 'hello1'
Key: 'meta:meta2'
Value: 'hello2'
Key: 'meta:my-uri'
Value: '/PutMarkLogicTest/20.xml'
Key: 'path'
Value: './'
Key: 'property:{org:example}hello'
Value: 'world'
Key: 'uuid'
Value: '35eb577d-f996-4773-a16a-9c25c67666ac'
-------------------QUERY RESULT-------------------
<?xml version="1.0" encoding="UTF-8"?>
<root><sample>xmlcontent</sample><dateTime xmlns="namespace-test">2000-01-01T00:00:00.000000</dateTime></root>
Hi Rob.
Sorry for the late reply. This looks great.
Also one separate question: is there a purpose in the design choice to not pass attributes downstream that came in from the start? It appears to be the use of creating a session rather than cloning one.
Regards, David Ennis
On Tue, 23 Aug 2022, 21:34 Rob Rudin, @.***> wrote:
@17llamas https://github.com/17llamas Let me know how this sounds for exposing collections, permissions, and document quality:
- Collections will be added as a "collections" attribute with all collections joined in a comma-delimited string
- For each unique role in the set of permissions, a "permission:(role-name)" attribute will be added with the list of capabilities for that role joined in a comma-delimited string - e.g. "permission:my-role" = "read,update"
- The document quality will be added to a "quality" attribute
- The entire metadata fragment will be added as a "document-metadata" attribute
We are considering adding an "ml-" prefix to each of these, though we initially won't touch the "meta:" and "property:" prefixes. That would help ensure uniqueness for these FlowFile attributes so that they don't collide with existing attributes.
— Reply to this email directly, view it on GitHub https://github.com/marklogic/nifi/issues/133#issuecomment-1224712361, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD2VTH7T4HFI2SVT5KEMT3V2URVPANCNFSM52MXSQ2Q . You are receiving this because you were mentioned.Message ID: @.***>
@17llamas That's a good question - there are some areas between processors where behavior differs when it seems like it should be the same. For example, I would think that any processor that retrieves one to many items from ML would follow the same original/results pattern, where each FlowFile sent to "results" is a clone of the original FlowFile sent to "original".
I am going to look into this further for 1.16.3.2 to firm up consistency between the processors. Going to get 1.16.3.1 out on Monday to address an SSL bug in RunFlowMarkLogic and then will get a plan together for 1.16.3.2.
HI Rob
A few notes:
Good that you will look at standardizing the Controllers a bit. I have gone through each line-by-line and it looks like they are created at different times by different people - and in some cases, for certain specific use-cases. This is clear when you look at the rows endpoint where very few of the options of the API are available to configure (so in my case, I use the eval endpoint and run the optic query from there).
Regarding no passing upstream flow attributes as is the case with QueryML, I have opened a separate item for that since it has it's own defined problem statement.
Will be addressing the properties issue in the next release.
In the use of QueryMarkLogic, you can set the option to return metadata with or without the content.
Under the hood, MarkLogic returns the entire payload (metadata-values, collections, permissions, quality, properties)
However, the implementation seems to toss out some of the metadata and only sets
I need additional information from the rapi:metadata payload. FOr the first use-case, I need collections. It would be a shame to have to make a second call for information already provided.
I was was thinking of one of the following:
Willing to work on this if there is value in it going back into the main project. -David Ennis