Open vdumoulin opened 9 years ago
The datastreams in Fuel are indeed stateful, but this is so by design. Perhaps we should raise a warning when get_epoch_iterator
overrides a non-empty child_epoch_iterator
?
How would you define non-empty? If next(child_epoch_iterator)
does not return StopIteration
?
This seems to be the only possible solution, although it looks sort of ugly.
In fact a data stream is something like a file, so maybe we should emulate the file behaviour at this point:
In [3]: s = StringIO('a\nb\nc\n\d')
In [4]: s
Out[4]: <StringIO.StringIO instance at 0x7efdeab5ac68>
In [5]: it = iter(s)
In [6]: next(it)
Out[6]: 'a\n'
In [7]: it2 = iter(s)
In [8]: next(it2)
Out[8]: 'b\n'
In [9]: next(it)
Out[9]: 'c\n'
that is not to reset the child_epoch_iterator
. What do you think? This still makes the user take care of not reusing a data stream, though.
Sounds good. I'd document that thoroughly, though. How would you go about implementing that in practice?
After sleeping over it, it sounds not so good anymore. The main loop relies on the fact the get_epoch_iterator
does reset the data stream. More specifically, that it switches the data stream to the next epoch, which so far for all available datasets is equivalent to calling reset
(there is some capacity for different epochs in the core).
That being said, I think that this is a documentation issue, and no changes in the code should be done.
Here's a minimal example reproducing the bug:
The problem lies in the fact that
Transformer
sets itschild_epoch_iterator
attribute whenget_epoch_iterator
is called. Even though the two iterators are different, the data stream they point to has the same child epoch iterator.This is an issue that can arise e.g. in Blocks when someone re-uses the same data stream for monitoring and for training. The obvious solution is "don't ever use a data stream for more than one purpose", but it took me quite some time to figure out what the problem was when I was made aware of the crash. It would be good if we could find a better long-term solution to that.