Open daniel-falk opened 1 year ago
Current workaround:
Note the comment "if all outputs are iterators, then the node is a generator node" - the issue only happens when all outputs from the node is Iterable
, therefore we can create a dummy output to the node (e.g. a MemoryDataset
) and save some garbage value ( e.g. an empty string) to it.
def my_node(input_video):
def my_generator():
...
return GeneratorVideo(my_generator(), ...), ""
node = node(
my_node,
inputs="my_video",
outputs=[
"my_output_ds",
"dummy"
],
name="Create video dataset",
)
Didn't have time to check this now, just quick question is this covered in the unit tests?
@noklam Probably not, and the question is where it should be covered. The video dataset works as expected with the framework design as it was when I implemented the dataset. The commit linked above changed the behaviour of Kedro, so sure, there could have been a unit test of the runner that verified that Iterable
datasets are possible to save, but it is a very niche thing to test.
The change is however already in and there has been multiple releases since then. Since the change was user-facing I suspect it will not be changed. Therefore I guess the only solution to this is to update the dataset to work with the new behaviour of the runner.
I also think that the new changes to the runner looks very interesting, so it might be something we can make use of to make the video dataset even better.
If anyone comes across this and would like to open a PR, please go ahead 🙂
Description
I wasn't sure if I should put this issue here or in the Kedro core repo since the
VideoDataSet
has been broken after a change in Kedro core.Shortly, the video dataset has multiple backends that can be saved. One of them is the
GeneratorVideo
which is anIterable
.In commit fcf3ab4a9 "Enable the usage of generator functions in nodes (#2161)" by @idanov the runner functionality was changed to handle generator datasets.
Snippet from
_run_node_sequential
inkedro/runner/runner.py
:Context
I have not had time yet to dive into the code and see what is actually happening, or how we should treat it in these situations. The effect now is that the runner will yield the first frame from the video and try to save that frame to the Video Dataset which is not possible.
The quick and dirty way to fix it would be to remove the
__iter__
method from theGeneratorVideo
class and implement some special logic in theVideoDataSet
class to iterate it correctly. This would however not be very nice from a user perspective sinceiter(video)
would then result in an indexed-iteration of the generator.Steps to Reproduce
GeneratorVideo
VideoDataSet