Closed grtjn closed 7 months ago
As a workaround we base64 encode the flowfile contents before passing it into the ExecuteScriptMarkLogic processor, and decoding it inside the XQuery code, executed by the ExecuteScriptMarkLogic processor.
Can you check a couple things:
LogAttribute
processor log the contents of the flowfile correctly - i.e. the correct diacritics appear?We had a report similar to this recently, and it turned out that the content was already mangled before the MarkLogic processor received the data.
@grtjn, any word on the previous questions? We'd love to ensure this has been properly addressed.
Missed the earlier comment, sorry. Let me get back to you about this.
@grtjn Going to close this for bookkeeping purposes, but please continue the conversation here if you have results from the questions above - I'm specifically wondering what LogAttribute
printed out.
We ingest data using ExecuteScriptMarkLogic, as we like to perform custom checks and add ingest metadata while inserting into MarkLogic. We pass flow file contents through to the XQuery code using the Content Variable property. We noticed however that if our flow files contain diacritics, like French, German, Polish names, and addresses often do, then end up garbled in MarkLogic. We checked thoroughly, and came to the conclusion that the ExecuteScriptMarkLogic is not ensuring it gets sent as UTF-8, as MarkLogic is probably expecting.
We are using the MarkLogic NiFi processors v1.16.3.2.