spotify / luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache License 2.0
17.71k stars 2.39k forks source link

worker: Log which outputs are missing when task is unexpectedly incomplete #3258

Closed progval closed 11 months ago

progval commented 11 months ago

Motivation and Context

When a task ran but did not write all its outputs, Luigi errors with:

RuntimeError: Unfulfilled dependency at run time: NodeProperties_graph__tmp_pytest_of_d_cnt_dir_rev_rel__02a93cea21

which can be hard to debug for tasks with many outputs. After this PR, it lists which outputs are missing. For example:

RuntimeError: Unfulfilled dependency at run time: NodeProperties_graph__tmp_pytest_of_d_cnt_dir_rev_rel__02a93cea21 (/tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.committer_timestamp.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.message.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.tag_name.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.content.is_skipped.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.tag_name.offset.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.committer_id.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.content.length.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.author_id.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.message.offset.bin, /tmp/pytest-of-dev/pytest-512/test_compressgraph_None_0/compressed_graph/graph.property.committer_timestamp_offset.bin)

Description

I implemented a basic __str__ for all target types, using either the path if it's filesystem-like, or a table name for database-like targets (to be consistent with BigQueryTarget).

Have you tested this? If so, how?

Unit tests are not broken, and this works for me (only tested with local filesystem)