hatchet-dev / hatchet

A distributed, fault-tolerant task queue
https://hatchet.run
MIT License
4.12k stars 151 forks source link

feat: add failure information to the onFailure steps #682

Open abelanger5 opened 3 months ago

abelanger5 commented 3 months ago

Currently the onFailure steps are just executed with a regular context - the context in the on failure step should include the reason for failure and which step it failed on.

guedesfelipe commented 2 months ago

You can see failed steps info like this:

    @hatchet.on_failure_step()
    async def rollback(self, context):
        logger.info(f'Rollback {self.__class__.__name__}')
        workflow_id = context.workflow_run_id()
        workflow = hatchet.client.rest.workflow_run_get(
            workflow_id,
        )
        logger.debug(f'Workflow ID: {workflow_id}, Workflow Status: {workflow.status}')
        for job in workflow.job_runs:
            if job.status == 'FAILED':
                for step in reversed(job.step_runs):
                    extra_info = ' - rollback not executed'
                    if step.status.value in ['SUCCEEDED', 'FAILED']:
                        # Add rollback for each steps here
                        extra_info = ' - rollback executed successfully'
                    logger.info(
                        f'Step {step.step.readable_id} executed with status {step.status.value}'
                        f'{extra_info}'
                    )
                    if step.error:
                        logger.warning(f'Error caught in step {step.step.readable_id}: {step.error}')

        return {
            "result": "success"
        }