opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
88 stars 121 forks source link

[FEATURE] Document the use of "client_config" parameters in connector blueprints #2487

Open chishui opened 1 month ago

chishui commented 1 month ago

Is your feature request related to a problem? client_config is a critical parameter in connector which controls the concurrency of async requests, timeout etc code. If this parameter is not set in the connector, when user ingests large amount of documents at the same time, they could run into such exception:

[2024-05-09T17:26:08,748][ERROR][o.o.m.e.a.r.MLSdkAsyncHttpResponseHandler] [integTest-2] Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.
Consider taking any of the following actions to mitigate the issue: increase max connections, increase acquire timeout, or slowing the request rate.
Increasing the max connections can increase client throughput (unless the network interface is already fully utilized), but can eventually start to hit operation system limitations on the number of file descriptors used by the process. If you already are fully utilizing your network interface or cannot further increase your connection count, increasing the acquire timeout gives extra time for requests to acquire a connection before timing out. If the connections doesn't free up, the subsequent requests will still timeout.
If the above mechanisms are not able to fix the issue, try smoothing out your requests so that large traffic bursts cannot overload the client, being more efficient with the number of times you need to call AWS, or by increasing the number of hosts sending requests.
java.lang.Throwable: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.
Consider taking any of the following actions to mitigate the issue: increase max connections, increase acquire timeout, or slowing the request rate.
Increasing the max connections can increase client throughput (unless the network interface is already fully utilized), but can eventually start to hit operation system limitations on the number of file descriptors used by the process. If you already are fully utilizing your network interface or cannot further increase your connection count, increasing the acquire timeout gives extra time for requests to acquire a connection before timing out. If the connections doesn't free up, the subsequent requests will still timeout.
If the above mechanisms are not able to fix the issue, try smoothing out your requests so that large traffic bursts cannot overload the client, being more efficient with the number of times you need to call AWS, or by increasing the number of hosts sending requests.

What solution would you like? We should document how to use of client_config parameter in connector blueprint.

What alternatives have you considered? Maybe we should have a better place for connector documentation.

Do you have any additional context? Add any other context or screenshots about the feature request here.

ylwu-amzn commented 1 month ago

@dhrubo-os Can you help on this ? I think we should add this to OpenSearch document

zhichao-aws commented 1 month ago

It's the connectionAcquisitionTimeout settings to control the timeout for Acquire operation took longer than the configured maximum time. We can't set this parameter in client_config now.

dhrubo-os commented 1 month ago

@dhrubo-os Can you help on this ? I think we should add this to OpenSearch document

@ylwu-amzn we have corresponding documentation here

But as @zhichao-aws said we don't have connectionAcquisitionTimeout settings now.

connectionAcquisitionTimeout:

This timeout specifies the maximum time the HTTP client should wait to acquire a connection from the connection pool before giving up.
If the connection pool is exhausted and all connections are in use, the client will wait for a connection to become available until this timeout is reached.
After the timeout is reached, the client will typically throw a ConnectionPoolTimeoutException or similar exception.

@chishui feel free to raise a PR for this.

dblock commented 3 weeks ago

Catch All Triage - 1 2 3 4 5 6