Open akmithal opened 3 years ago
Hi @jeniawhite , @nimrod-becker - This is something new found just now.
Instead of this failure, shouldn't there be delay introduced for serving this IO request ?
Please see from logs that memory and CPU utilization wasn't high when these errors got triggered.
@akmithal Works as designed, we return the slow-down message to the client and expect it to reduce the rate of requests. This is returned due to a bottleneck on memory utilization for the NSFS flows. The next step would be to add memory utilization to the HPA and fine-tune the amount of memory that we allocate for IO in the NSFS flows.
Hi @jeniawhite , are we going to trim down the error logs for this error. If yes, you can use this bug to track this work. If it is going to be like this only / PR already open, will close this bug then.
Hi @nimrod-becker , Is there a PR/workitem created to track this work "add memory utilization to HPA"? Else, we can use this defect for it. FYI @romayalon
We can keep this issue until we have a task tracking this, basically, we wanted to go with something that eventually did not make it to openshift so we need to wait until the new way is available.
@romayalon , could you link the JIRA ticket here for reference ?
No Jira ticket, this needs to be scheduled for a certain release
This issue had no activity for too long - it will now be labeled stale. Update it to prevent it from getting closed.
dont close
This issue had no activity for too long - it will now be labeled stale. Update it to prevent it from getting closed.
Environment info
Actual behavior
Expected behavior
Steps to reproduce
From app-server, got error:
download failed: s3://bucket-3/obj_112983 to ./obj_112983 An error occurred (504) when calling the GetObject operation: Gateway Timeout
From endpoint logs:
More information - Screenshots / Logs / Other output