project-lux / lux-marklogic

Code, issues, and resources related to LUX MarkLogic
Other
3 stars 2 forks source link

Enable point-in-time queries with a rolling window of time (from 643) #14

Open gigamorph opened 4 months ago

gigamorph commented 4 months ago

As a contingency for a bad incremental data update, we wish to:

  1. Configure MarkLogic's database merge policy to retain a week's worth of deleted documents.
    • A negative value may be specified for a rolling window of time; see https://docs.marklogic.com/guide/admin/merges#id_71724.
    • This may be specified in the database's ML Gradle configuration file.
    • By delaying MarkLogic's ability to merge (logically) deleted documents out of the database, point-in-time queries are possible but the storage footprint will increase. The size increase will depend on the size of the logically deleted documents and thus how much data is updated by incremental updates.
    • TBD if this also has an impact on performance (larger database)
  2. Enable all LUX backend code to use a specified timestamp <-- Question this. Per the above rolling window of time documentation (via negative merge timestamp), one would use xdmp:forest-rollback instead. If the merge timestamp is positive, the following may apply.
    • We elected not to allow backend endpoint consumers to specify the timestamp --if we're in the midst of a data recovery / repair, endpoint consumers may not know the correct timestamp to use.
    • A new Gradle build property could set a new constant in appConstants.mjs.
    • Per https://docs.marklogic.com/guide/app-dev/point_in_time#id_74234, the only functions that can specify the timestamp are xdmp.eval, xdmp.invoke, and xdmp.spawn.
    • We don't care about xdmp.spawn as we're not using the task server. That could change in the future. For example, scheduled tasks use the task server.
    • We should be able to use xdmp.invokeFunction as its documentation states it accepts the same options as xdmp.invoke.
    • If true, we should consider starting in the data services. All pass a function into handleRequest. Perhaps we can change that to pass in the function name as a string (plus all parameters --be careful if other code is executed to determine the parameter value) then have handleRequest use xdmp.invokeFunction.
    • The code base includes other calls to xdmp.eval and xdmp.invoke --both in the JavaScript and build.gradle. We will need to ensure those too use the correct timestamp. Some are called by functions passed into handleRequest. Perhaps those will inherit the timestamp specified therein. Perhaps not. We could test and/or be explicit everywhere --possibly safer for new developers and alternative code paths (e.g., Query Console).
    • Testing should at least include:
      • Search
      • Facets
      • Document retrieval
      • Triples
    • This should be well-documented and understood by current and future developers as we need to avoid part of the code base executing at a different timestamp as the rest.

Open question: how to determine the timestamp for a specified date and time? We will need to know how to do this in order to know which timestamp to use, when not the latest.