Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache License 2.0
17.71k
stars
2.39k
forks
source link
worker: Log which outputs are missing when task is unexpectedly incomplete #3258
When a task ran but did not write all its outputs, Luigi errors with:
RuntimeError: Unfulfilled dependency at run time: NodeProperties_graph__tmp_pytest_of_d_cnt_dir_rev_rel__02a93cea21
which can be hard to debug for tasks with many outputs. After this PR, it lists which outputs are missing. For example:
RuntimeError: Unfulfilled dependency at run time: NodeProperties_graph__tmp_pytest_of_d_cnt_dir_rev_rel__02a93cea21 (/tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.committer_timestamp.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.message.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.tag_name.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.content.is_skipped.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.tag_name.offset.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.committer_id.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.content.length.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.author_id.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.message.offset.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.committer_timestamp_offset.bin)
Description
I implemented a basic __str__ for all target types, using either the path if it's filesystem-like, or a table name for database-like targets (to be consistent with BigQueryTarget).
Have you tested this? If so, how?
Unit tests are not broken, and this works for me (only tested with local filesystem)
Motivation and Context
When a task ran but did not write all its outputs, Luigi errors with:
which can be hard to debug for tasks with many outputs. After this PR, it lists which outputs are missing. For example:
Description
I implemented a basic
__str__
for all target types, using either the path if it's filesystem-like, or a table name for database-like targets (to be consistent withBigQueryTarget
).Have you tested this? If so, how?
Unit tests are not broken, and this works for me (only tested with local filesystem)