Closed tschaffter closed 3 years ago
Tagging @gkowalski
@tschaffter what errors are people getting to warrant this? The errors @gkowalski are getting have nothing to do with the service not being started.
@thomasyu888
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Is your proposal related to a problem?
Yes. The current exponential backoff strategy only retries 3 times when a request sent to a tool fails. These 3 attempts happen 2, 4, and 8 second later, thus taking 14 seconds. This amount of time is not sufficient in case a tool gets temporarily "stuck".
Most importantly, one problem with the current implementation is that the controller starts sending requests to tools even without being sure that a tool is fully initialized. One way to fix cleaning this issue would be to add an a dedicated endpoint to the tools to interrogate them on their initialization status.
Meanwhile, the exponential backoff solution can provide a solution to this issue.
Describe the solution you'd like
Increase the exponential backoff strategy so that it covers a period of 3 minutes, which should be sufficient for most tool to get fully initialized. Our Spark NLP tools already takes about 1.5 minutes to initialize.
Describe alternatives you've considered
Add a new endpoint to tools, see above
Additional context
Is the request retried if the tool has not responded yet? What if the tool takes more than 2 seconds to respond, is a new request sent or the controller always wait on receiving an error response before retrying to send the request?