We currently implement query_through only on simple stateless operators like filters and projections. However, in theory, it should also be possible to query through multi-ancestor stateless operators such as unions and joins. This would significantly decrease the systems' memory footprint, as we no longer need to materalize intermediate join outputs for nested joins. We do need to be a little careful when implementing it however, as there may be strange interactions with concurrent backfills, but that shouldn't be too complicated. The majority of the work would be to implement "backwards" join processing (i.e., query_through) for unions and joins (they currently only implement "forward" processing through on_input).
We currently implement
query_through
only on simple stateless operators like filters and projections. However, in theory, it should also be possible to query through multi-ancestor stateless operators such as unions and joins. This would significantly decrease the systems' memory footprint, as we no longer need to materalize intermediate join outputs for nested joins. We do need to be a little careful when implementing it however, as there may be strange interactions with concurrent backfills, but that shouldn't be too complicated. The majority of the work would be to implement "backwards" join processing (i.e.,query_through
) for unions and joins (they currently only implement "forward" processing throughon_input
).