ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
99
stars
136
forks
source link
[Feature/multi_tenancy] Implement client.bulk in SDK Client #3192
Added a new DataObjectRequest parent class. I originally started with a tagging interface just to help method signatures, but realized that four classes duplicated the same code for the index, id, and tenantId fields, so refactored those (and the associated builder code) into the superclass. Existing unit tests cover this change. https://github.com/opensearch-project/ml-commons/pull/3192/commits/caeea219cef12a70d344f2639f17f13ac9a62956
Implemented the new interface method in all 3 clients. The local and remote implementations use the OpenSearch bulk request natively. The DynamoDB implementation simply iterates the bulk request and calls existing code for the individual requests; this is not the most efficient but necessary to preserve ordering. If all the bulk requests are put and delete, it's possible to optimize this with DDB BulkWrite, which is a good future optimization; however since our primary use case uses update I'm deferring that complexity. https://github.com/opensearch-project/ml-commons/pull/3192/commits/ee30d210818a84b89027860e5c3a0ac7762a950b
In addition there are several whitespace / import reordering changes due to spotless not having been run on the common module.
Will be handled integ tests in #2818
Check List
[x] New functionality includes testing.
[x] Commits are signed per the DCO using --signoff.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Description
Adds an SDKClient equivalent to
client.bulk()
.Summary of changes (helpful for reviewers):
DataObjectRequest
parent class. I originally started with a tagging interface just to help method signatures, but realized that four classes duplicated the same code for the index, id, and tenantId fields, so refactored those (and the associated builder code) into the superclass. Existing unit tests cover this change. https://github.com/opensearch-project/ml-commons/pull/3192/commits/caeea219cef12a70d344f2639f17f13ac9a62956DataObjectResponse
parent class similar to above, refactoring the id and parser fields from the subclass. Existing unit tests cover this change. https://github.com/opensearch-project/ml-commons/pull/3192/commits/445936bcfa23410778cc024a485b69b307a44c62 (part 1)BulkDataObjectRequest
andBulkDataObjectResponse
classes and unit tests. Added afailed
field to the response superclass to track failures/errors. https://github.com/opensearch-project/ml-commons/pull/3192/commits/445936bcfa23410778cc024a485b69b307a44c62 (part 2).bulk()
method, and unit tests https://github.com/opensearch-project/ml-commons/pull/3192/commits/22dccf4af0c41f07625ffe6857fd17df80b0bf79In addition there are several whitespace / import reordering changes due to spotless not having been run on the common module.
Will be handled integ tests in #2818
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.