rheem-ecosystem / rheem

Rheem - a cross-platform data processing system
https://rheem-ecosystem.github.io
5 stars 0 forks source link

Retain input channel instances for lazily executed operators #51

Closed sekruse closed 7 years ago

sekruse commented 7 years ago

In some rather unusual situations, we might run into problems with the inputs of lazily executed operators:

  1. Some channel instance, e.g., a file F, is produced.
  2. A lazily executed operator O consumes that channel instance.
  3. However, O is not actually executed within its stage for some reason.
  4. In some subsequent stage, the execution of O is triggered but F is already deleted. The execution fails.

An obvious fix is to pinpoint F in the lazy execution lineage until the execution of O has actually taken place.