Open paul-butcher opened 10 months ago
This is a non-trivial refactor, so I'll put in some of my thinking now so I don't have to relearn it later.
Perhaps this can be better expressed without being a separate stage, and instead being run as part of the merger?
It needs to be a (sub)stage after merger, as it relies on the right data being in the merged database.
There is a minor conflict between efficiency and purity here. The current incarnation knows that there is a collectionPath in each record it messes with, so it can notify the relationEmbedder via the pathSender.
However, I want us to be able to describe each full stage (either individual standalone apps, or a subsystem like relation_embedder or matcher_merger) as accepting and sending work ids.
As a result, the concatenator, as the final stage within the matcher_merger subsystem, should notify downstream with work ids, which will then be used by the merger to retrieve the work and notify the next stage in its own subsystem with a path.
This inefficiency is pretty minor, and it's better to be clear about boundaries.
I think it also needs to accept a work id. That way it knows what to notify downstream about if nothing changes.
Path concatenator is currently in the relation_embedder subsystem, but this is the wrong place for it as it writes to works-merged, so it belongs in the matcher/merger subsystem.
It could be triggered as part of the sendWorkOrImage function in the merger.