Open PulkitMishra opened 1 month ago
There's a few different types of errors. I guess this will surface when you make the changes.
For the worker and related code (anything that ends up touching the user, ie. graph, code), does it make sense to capture only the task Id rather than a lot of data such as inputs? With task id we might be able to use the server blog store to retrieve inputs/outputs.
@stangirala ah ok i get the context that you were talking about now wrt #900 . few thoughts
Error Categorization: I think as we implement the new error handling system, we'll create a flexible hierarchy of error types. This should allow us to categorize errors more precisely as they surface during development and testing.
Error Context: For worker-related errors, you're right that including full input data might be excessive. What we can do is : a) Include only essential information in the initial error report (task ID, error type, brief message). b) Provide a method to fetch detailed error context on demand, using the task ID to retrieve data from the server's blob store. This will reduce the amount of potentially sensitive data in error logs, minimize performance impact of error reporting, allow for more detailed debugging when necessary. as we all know how some that some workflow orchestrators that shall remain unnamed are the bane of folks doing mle simply because of garbage logging and not providing logs of intermediate stages in a nice tidy way
Configurable Error Detail Level: We can obviously add configuration options allowing users to set the level of detail included in error reports. This could range from minimal (just task ID and error type) to comprehensive (including sanitized input data summaries).
related #909
Improve Error Handling in Indexify Python SDK
Issue Description
The current implementation of the Indexify Python SDK lacks robust error handling and reporting mechanisms.
Specific Examples
indexify/remote_client.py
, the_request
method:Issues:
httpx.ConnectError
, ignoring other potential exceptions.indexify/executor/function_worker.py
, theasync_submit
method:Issues:
indexify/executor/agent.py
, thetask_completion_reporter
method:Issues:
Exception
catch, which may mask specific errors.Proposed Solution
Create a custom exception hierarchy:
IndexifyException
class.NetworkError
,ExecutionError
,ConfigurationError
).Implement a centralized error handling and logging mechanism:
ErrorHandler
class that can be configured with custom logging and reporting options.ErrorHandler
consistently throughout the SDK.Enhance error context:
Improve retry mechanisms:
Add error callback support:
Implementation Plan
indexify/exceptions.py
.ErrorHandler
class inindexify/error_handling.py
.ErrorHandler
:remote_client.py
to use specific exceptions and theErrorHandler
.function_worker.py
to provide more context in errors and use theErrorHandler
.agent.py
with improved error handling and retry logic.