I just added an extremely dirty hack to the noria integration. It has to be removed in the future, but it does seem to work for now. I am documenting here so we can clean this up at some point.
Background
In the noria integration we insert a join before any node with more than one parent. Each node carries some context columns on which we can join. Aka we might have node A with the output columns c1, c2 and the context (key) column k1. The full output of the node is then k1, c1, c2. When we join outputs we join on the key. So we would have another node B with say k1, c3.
Then the join works on k1 and should output k1, c1, c2, c3. The k1 needs to be in the output, in case we wish to join more stuff onto it.
However for some reason when I do A join B the noria join node outputs k1, c1, c2, None, c3. I think the None is actually it trying to output k1 again.
So what I did, because I suspect that the None has something to do with k1, is to record the output columns as k1, c1, c2, k1, c3. This seems to run correctly, because subsequent nodes just ignore the second k1 column, and in theory doing it like this should also do the expected thing for multi-keys.
Fix
We need to find out how exactly join works with respect to what columns it produces. We may even have to wait until it can join on the same table, because currently that can only be achieved by cheating.
We may also find out that this is actually the way it works and then we can just make the hack more robust.
I just added an extremely dirty hack to the noria integration. It has to be removed in the future, but it does seem to work for now. I am documenting here so we can clean this up at some point.
Background
In the noria integration we insert a
join
before any node with more than one parent. Each node carries some context columns on which we can join. Aka we might have nodeA
with the output columnsc1, c2
and the context (key) columnk1
. The full output of the node is thenk1, c1, c2
. When we join outputs we join on the key. So we would have another nodeB
with sayk1, c3
. Then the join works onk1
and should outputk1, c1, c2, c3
. Thek1
needs to be in the output, in case we wish tojoin
more stuff onto it. However for some reason when I doA join B
the noriajoin
node outputsk1, c1, c2, None, c3
. I think theNone
is actually it trying to outputk1
again. So what I did, because I suspect that theNone
has something to do withk1
, is to record the output columns ask1, c1, c2, k1, c3
. This seems to run correctly, because subsequent nodes just ignore the secondk1
column, and in theory doing it like this should also do the expected thing for multi-keys.Fix
We need to find out how exactly
join
works with respect to what columns it produces. We may even have to wait until it can join on the same table, because currently that can only be achieved by cheating.We may also find out that this is actually the way it works and then we can just make the hack more robust.