marklogic / nifi

Mirror of Apache NiFi to support ongoing MarkLogic integration efforts
https://marklogic.github.io/nifi/
Apache License 2.0
12 stars 23 forks source link

Document usage of optionsJson in PutMarkLogic #194

Closed rjrudin closed 1 year ago

rjrudin commented 1 year ago

optionsJson is at this point a deprecated technique for capturing the URIs in a PutML batch so that they can be used with DHF. However, the structure of the JSON options is intended for use with an internal DHF endpoint, which we don't want to encourage use of. Additionally, there are no docs for this - although it's visible to a user when using e.g. LogAttribute to see what's in the FlowFile.

I think we need two enhancements here:

  1. Use WritesAttribute on PutML to document the existence of this attribute and what its purpose is
  2. Consider an enhancement that constructs the options JSON that a user would actually want with DHF, which is one that uses sourceQuery with e.g. a cts.documentQuery for constraining on the written URIs. We likely would want a user to turn this feature on, since it has no value unless the user wants to run a flow.
rjrudin commented 1 year ago

Until then, a user can adapt the optionsJson value into a proper set of DHF options that overrides the sourceQuery by inserting an ExecuteScript processor in between PutML and RunFlowML with the following ecmascript body:

flowFile = session.get();
uris = JSON.parse(flowFile.getAttribute("optionsJson")).uris;
sourceQuery = "cts.documentQuery(['" + uris.join("','") + "'])";
dhfOptions = '{"sourceQuery": "' + sourceQuery + '"}';
session.putAttribute(flowFile, "dhfOptions", dhfOptions);
session.transfer(flowFile, REL_SUCCESS);
rjrudin commented 1 year ago

Below is a script equivalent to what's above, but simply using the URIs attribute instead:

flowFile = session.get();
uris = flowFile.getAttribute("URIs");
sourceQuery = "cts.documentQuery(['" + uris.split(",").join("','") + "'])";
dhfOptions = '{"sourceQuery": "' + sourceQuery + '"}';
session.putAttribute(flowFile, "dhfOptions", dhfOptions);
session.transfer(flowFile, REL_SUCCESS);

The 1.16.3.3 release will note that optionsJson is deprecated and will refer to this ticket for the better approach above.

rjrudin commented 1 year ago

Resolved via #199