marklogic / marklogic-contentpump

MarkLogic Contentpump (mlcp)
http://developer.marklogic.com/products/mlcp
Apache License 2.0
34 stars 27 forks source link

Errors are not reported when transform throws an error and thread count equals or exceeds number of documents #180

Open rjrudin opened 3 years ago

rjrudin commented 3 years ago

To reproduce:

  1. Make an MLCP transform that will throw an error (see below)
  2. Setup an import job that will ingest e.g. 5 documents
  3. Set the thread count to higher than 5 (the MLCP default is 8 or 16, so don't really need to change this)
  4. Run the import

You'll get the following feedback from MLCP:

2021-02-03 10:09:44 INFO  LocalJobRunner:291 - com.marklogic.mapreduce.MarkLogicCounter: 
2021-02-03 10:09:44 INFO  LocalJobRunner:295 - INPUT_RECORDS: 6
2021-02-03 10:09:44 INFO  LocalJobRunner:295 - OUTPUT_RECORDS: 6
2021-02-03 10:09:44 INFO  LocalJobRunner:295 - OUTPUT_RECORDS_COMMITTED: 6
2021-02-03 10:09:44 INFO  LocalJobRunner:295 - OUTPUT_RECORDS_FAILED: 0

But no documents will be ingested. If you check the error log, you'll see these messages:

2021-02-03 15:13:10.763 Info: Exception caught while transformingundefined: TypeError: Cannot read property 'not' of undefined

So those are good in that they indicate an error happened, but bad in that the URI isn't shown, so a user doesn't know where to check.

Expectation - the thread count shouldn't matter here; I should get the same feedback from MLCP as when the thread count is less than the number of documents being ingested - e.g.

2021-02-03 10:16:10 ERROR TransformWriter:575 - Batch 142909594.0: Document failed permanently: /Users/rrudin/workspace/marklogic-data-hub/examples/reference-entity-model/input/json/Cust1.json
2021-02-03 10:16:10 WARN  TransformWriter:581 - TypeError: Cannot read property 'not' of undefined

Note that there might be other factors at play here - I don't know if the number of forests matters.

rjrudin commented 3 years ago

I forgot to include an example of a transform that throws an error - something this simple will do the trick:

function transform(content, context) {
  console.log("this will throw an error", content.does.not.exist);
  return content;
}
module.exports = {
  transform
}
plackowk commented 3 years ago

I wanted to notice, that apparently this bug happens only if transform module is in sjs. All errors are caught properly if module is in xquery, i.e.:

xquery version "1.0-ml";

module namespace mlcpFlow = "http://marklogic.com/data-hub/mlcp-flow-transform";

declare option xdmp:mapping "false";

declare function mlcpFlow:transform(
  $content as map:map,
  $context as map:map
) as map:map*
{
  let $_ := fn:error(xs:QName("Error"), "Mock error")
  return ()
};