Open d-w-moore opened 3 years ago
One way around this, with the upcoming release (4.2.10.1) of the indexing plugin, will be to use atomic metadata operations where feasible (python-irodsclient will now allow this). But ultimately, we may want to follow the approach of limiting the number of delayed tasks submitted per object, particularly when they are seen at time of submission to be redundant with later tasks already entered into the delay queue.
for clarity... "later tasks already entered"... i am reading as "future tasks already entered"
Correct, that was imprecisely worded... we're dealing with tasks that are known to be pending with an execution time comfortably far in the future.
This is relevant to the new (Metalnx) metadata schema, as all metadata AVU are now kept within a list element of the index record describing the data object (or collection) they annotate.
Because of this, any scenario (but in particular automated ingest with importation/translation of many e.g. header key-value elements then being attached as AVU's by the ingest hook) in which N metadata AVU's are attached in rapid-fire fashion will spawn N delayed re-indexing tasks. If the object starts out with no AVU's (best case) we will have an average of N/2 AVU's already present per task, so the total number of iterations over AVU's tends toward N*(N/2), which is order N-squared.