opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
118 stars 156 forks source link

[FEATURE] Reduce ml common zip size #3733

Open Hailong-am opened 1 month ago

Hailong-am commented 1 month ago

Is your feature request related to a problem?

At current time, the ml-commons zip file is 346.1mb https://mvnrepository.com/artifact/org.opensearch.plugin/opensearch-ml-plugin/2.19.1.0

After unzip it, the biggest file is onnxruntime_gpu-1.16.3.jar which is 300M, look into this jar file, the native lib take the most of space.

Image

What solution would you like?

we can built different zip for different platform win/linux and remove unnecessary native lib files, that could reduce roughly about 150mb size of zip file.

What alternatives have you considered? A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context? Add any other context or screenshots about the feature request here.

mingshl commented 1 month ago

woohoo will be 50% light weight

dhrubo-os commented 1 month ago

Currently this is how , we pull the GPU dependencies.

Do you have any other suggestion? How

Are you suggesting if we see the OS has gpu support then we download gpu related libraries?

Hailong-am commented 1 month ago

Currently this is how , we pull the GPU dependencies.

Do you have any other suggestion? How

Are you suggesting if we see the OS has gpu support then we download gpu related libraries?

The initial idea is we add a repackage task to remove unnecessary native lib for specific platform.

But this requires we have different zip file for different platform in maven repository

I found a issue in onnxruntime repo too. https://github.com/microsoft/onnxruntime/issues/12084

https://github.com/opensearch-project/ml-commons/blob/2da7d7a1e7d5cd80bd9940dd6b7ff1c32c009582/ml-algorithms/build.gradle#L59-L68

and this code looks not correct to me, if we build the ml-commons in linux system and publish a zip into maven repository. A customer download the zip and install it manually into a mac distribution, that will not work as mac don't support gpu as the comments mentioned.

Hailong-am commented 1 month ago

btw, I have a draft PR for this, with this change the zip file size reduced to 187M, it saved 346-187 = 159M disk space for linux platform.

187M Apr 16 06:40 opensearch-ml-3.0.0.0-beta1-SNAPSHOT.zip