milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.16k stars 2.71k forks source link

[Enhancement]: IBM Power (ppc64le) support #29566

Open sumitd2 opened 6 months ago

sumitd2 commented 6 months ago

Is there an existing issue for this?

What would you like to be added?

I belong to the IBM Power porting team. We have recently developed patches to build Milvus on ppc64le, and would like to open a PR to upstream the changes to this repo. The major changes were adjusting the versions for some conan installed dependencies, and creating a local cmake repo for conan because the conan binary package for cmake currently does not have power binaries. So we need to somehow find a way to ensure cmake supports power, and any future dependency version modifications work on all architectures. Do you have any suggestions?

Why is this needed?

There are customers who wish to use Milvus on Power.

Anything else?

https://github.com/ppc64le/build-scripts/pull/3467

sumitd2 commented 6 months ago

cc: @seth-priya

stale[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

xiaofan-luan commented 5 months ago

Is there an existing issue for this?

  • [x] I have searched the existing issues

What would you like to be added?

I belong to the IBM Power porting team. We have recently developed patches to build Milvus on ppc64le, and would like to open a PR to upstream the changes to this repo. The major changes were adjusting the versions for some conan installed dependencies, and creating a local cmake repo for conan because the conan binary package for cmake currently does not have power binaries. So we need to somehow find a way to ensure cmake supports power, and any future dependency version modifications work on all architectures. Do you have any suggestions?

Why is this needed?

There are customers who wish to use Milvus on Power.

Anything else?

ppc64le/build-scripts#3467

Great work. The only concern is how we can deploy ci/cid to make sure the Power deployment will not be broken by some new PRs. We don't have any IBM Power machines so that might be really challenge

sumitd2 commented 5 months ago

Hi @xiaofan-luan You can request a ppc64le node for your CI from this link: https://osuosl.org/services/powerdev/request_hosting/ (Please put "Gerrit Huizenga" in the IBM advocate field.)

Also, I have successfully built and tested Milvus 2.3.3 using our patches on Ubuntu 22.04 and RHEL 9.3, on both x86_64 and ppc64le. You will need to create a repo for "cmake/3.28.3@milvus/dev" to host cmake binaries for both Intel and Power. I hope that's fine.

cc: @seth-priya

stale[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

xiaofan-luan commented 3 months ago

keep this

xiaofan-luan commented 3 months ago

@sumitd2 sorry for the late reply.

I'm not very familiar with the compilation part. We can keep it as long as there is simple script that can run for milvus to build on Power series.

This seems to be a large pr so let's try to do this

  1. change knowhere to support POWER -> especially for the same functionality of SIMD
  2. make a script that can compile milvus
  3. dockerfile
sumitd2 commented 3 months ago

Hi @xiaofan-luan, we are working on the SIMD optimizations and the PR. Will get back to you on this.

xiaofan-luan commented 3 months ago

@alexanderguzhva can help on this if necessary

alexanderguzhva commented 3 months ago

@sumitd2 I've noticed a wave of AIX-related PRs to the Faiss baseline. I will propagate these changes to our Faiss fork in Knowhere. Please feel free to let me know if you need any assistance with Faiss/Knowhere as well.

stale[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

sumitd2 commented 2 months ago

@sumitd2 sorry for the late reply.

I'm not very familiar with the compilation part. We can keep it as long as there is simple script that can run for milvus to build on Power series.

This seems to be a large pr so let's try to do this

  1. change knowhere to support POWER -> especially for the same functionality of SIMD
  2. make a script that can compile milvus
  3. dockerfile

Hi @xiaofan-luan, couple of quick points here:

  1. Our team that was working on SIMD has abandoned the idea. They say the compiler generated code is better than handwritten.
  2. A script here will not work, as it has the chance of being broken by future PRs. This will have to be a proper port with the changes going into master, and a CI node (which we can provide on osuosl.org) to include the Power CI as a job. The conan binary package for cmake currently does not have Power binaries, so we will have to find a solution for this - many of our customers are interested in Milvus on Power

Let me know your thoughts. cc: @seth-priya

xiaofan-luan commented 2 months ago

@alexanderguzhva could you help with the IBM team with the milvus side change?

xiaofan-luan commented 2 months ago

@locustbaby could you try to request a powerPC machine for the basic ci/cd?

xiaofan-luan commented 2 months ago

@sumitd2这么晚才回复很抱歉。 我对编译部分不是很熟悉。只要有简单的脚本可以运行 milvus 在 Power 系列上构建,我们就可以保留它。 这似乎是一个很大的公关,所以让我们尝试这样做

  1. 更改knowhere以支持POWER -> 特别是对于SIMD的相同功能
  2. 制作一个可以编译milvus的脚本
  3. docker文件

你好@xiaofan-luan,这里有几个要点:

  1. 我们致力于 SIMD 的团队已经放弃了这个想法。他们说编译器生成的代码比手写的更好。
  2. 这里的脚本不起作用,因为它有可能被未来的 PR 破坏。这必须是一个正确的端口,更改将进入主节点,并且必须是一个 CI 节点(我们可以在 osuosl.org 上提供)以将 Power CI 作为作业包含在内。 cmake 的 conan 二进制包目前没有 Power 二进制文件,因此我们必须为此找到解决方案 - 我们的许多客户对 Milvus on Power 感兴趣

让我知道你的想法。抄送:@seth-priya

@sumitd2 sorry for the late reply. I'm not very familiar with the compilation part. We can keep it as long as there is simple script that can run for milvus to build on Power series. This seems to be a large pr so let's try to do this

  1. change knowhere to support POWER -> especially for the same functionality of SIMD
  2. make a script that can compile milvus
  3. dockerfile

Hi @xiaofan-luan, couple of quick points here:

  1. Our team that was working on SIMD has abandoned the idea. They say the compiler generated code is better than handwritten.
  2. A script here will not work, as it has the chance of being broken by future PRs. This will have to be a proper port with the changes going into master, and a CI node (which we can provide on osuosl.org) to include the Power CI as a job. The conan binary package for cmake currently does not have Power binaries, so we will have to find a solution for this - many of our customers are interested in Milvus on Power

Let me know your thoughts. cc: @seth-priya

we also need your help to finish the power pc compile. is there a pr for this issue?

sumitd2 commented 2 months ago

@xiaofan-luan We built 2.3.3 and its rather old now. I am building 2.4.1 on RHEL 9.3 since the afternoon and on the face of it, it looks like it requires minimal changes.

xiaofan-luan commented 2 months ago

sure @locustbaby can help you to do the setup

locustbaby commented 2 months ago

@sumitd2 I've submitted a ticket#33429 for a powerpc

sumitd2 commented 1 month ago

@sumitd2 I've submitted a ticket#33429 for a powerpc

@locustbaby We have received your request, and it has been approved. Now its a matter of the administrator creating the VM, shouldn't be long.

sumitd2 commented 1 month ago

@locustbaby OSL has created the VM but they are waiting on a GPG key from you to provide access:

Your Openstack account is set up as well. Please send us your public GPG key, in order to allow us to securely provide you the login details. Checking back to see if you were able to log in and send us a gpg key.

locustbaby commented 1 month ago

@sumitd2 I'm not very familiar with OSL, where should I submit the GPG key, I didn't receive any email about this. And should I add the GPG key to our project? It would be great if you could describe the steps to follow!

janani66 commented 1 month ago

@locustbaby -- OSU admins need a public gpg keys from you so that we can send the VPN certificates for the VM in an encrypted message. Please respond in the ticket you had created with your GPG key.

locustbaby commented 1 month ago

@janani66 How to login the Support channel, where can I register it?I didn't have an account.

janani66 commented 1 month ago

How did you create the ticket without login creds?

Anyway you can also paste your public gpg key here in this issue and I can pass it on to the OSU folks

Get Outlook for iOShttps://aka.ms/o0ukef


From: Sheldon @.> Sent: Friday, May 17, 2024 1:32:58 AM To: milvus-io/milvus @.> Cc: Janani Janakiraman @.>; Mention @.> Subject: [EXTERNAL] Re: [milvus-io/milvus] [Enhancement]: IBM Power (ppc64le) support (Issue #29566)

@janani66 How to login the Support channel, where can I register it?I didn't have an account. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned. Message ID: milvus-io/milvus/issues/29566/2116849448@ github. com ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!91FQ2X26mqcYXt6aZU-rGealfi8_DtOaYyeYvgQJzoalVbvypqqcq6jypjhqEnvVBYxnSk5TDOhaIiK_U0k4sQbz1jIZ9-ur0ykDZQ$ Report Suspicious

ZjQcmQRYFpfptBannerEnd

@janani66https://github.com/janani66 How to login the Support channel, where can I register it?I didn't have an account.

— Reply to this email directly, view it on GitHubhttps://github.com/milvus-io/milvus/issues/29566#issuecomment-2116849448, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMUXKUKHNCLQ6V6NO3CI63DZCWQBVAVCNFSM6AAAAABBFJKMWOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWHA2DSNBUHA. You are receiving this because you were mentioned.Message ID: @.***>

locustbaby commented 1 month ago

@janani66 No login required for the ticket, I just submitted it directly. Here is my GPG key, please guide me continue, thanks gpg.txt

alexanderguzhva commented 1 month ago

Please make sure that the corresponding PowerPC patches for knowhere are present in your codebase during your experiments:

@locustbaby @sumitd2

zwall-bp commented 1 month ago

Hello @locustbaby I'm Zach, with the OSL. Just wanted to extend out to let you know I have sent your credentials to your email. Be sure to check for any messages from powerdev-request@osuosl.org in the inbox of the account your provided us when your sent the request. It may have gone into your spam folder.

locustbaby commented 1 month ago

@zwall-bp Thanks for all the messages in the email, Zach, I can connect the VM by ssh now, I used to receive an email that told me the ticket was created, but as I said, I submitted the request without any login so that I never received messages before. image image

locustbaby commented 1 month ago

@sumitd2 What's the next step I need to do? Add this VM to this project? @zwall-bp It would be very nice if you could guide me registration, I retried https://osuosl.org/services/powerdev/request_hosting/ again and found out that even an empty form can be submitted.

zwall-bp commented 1 month ago

I have already set up a VM for you to use. I'm assuming by registration, you are asking about getting access to our OpenStack Web UI, which allows you to remove your currently running VM, and create a fresh, new one. More information about utilizing OpenStack can be found on our wiki. My latest email has the encrypted message for logging into OpenStack. More information about decrypting the file can be found here.

If the registration is about our support site, we only use that internally, and communicate with you through email.

sumitd2 commented 1 month ago

@locustbaby I have a rhel script with a patch that builds milvus (2.4.1) and runs the go, cpp and e2e tests. There are a few failing tests, so we will have to figure them out, 10 Azure related cpp tests are failing, 3 golang tests, and 3 e2e tests (xpassing). Let me know when you are ready. cc: @seth-priya @alexanderguzhva

xiaofan-luan commented 1 month ago

@locustbaby I have a rhel script with a patch that builds milvus (2.4.1) and runs the go, cpp and e2e tests. There are a few failing tests, so we will have to figure them out, 10 Azure related cpp tests are failing, 3 golang tests, and 3 e2e tests (xpassing). Let me know when you are ready. cc: @seth-priya @alexanderguzhva

we can ignore azure test for now. becasue i guess no one is actually running on azure + ppc? if you can share all the failing case we can help on check it as well

sumitd2 commented 1 month ago

@xiaofan-luan 3 E2e tests are xpassing:

milvus_client/test_milvus_client_collection.py::TestMilvusClientCollectionValid.test_milvus_client_collection_fast_creation_default[32768] X0% ▏         ilvus_client/test_milvus_client_collection.py
milvus_client/test_milvus_client_collection.py::TestMilvusClientCollectionValid.test_milvus_client_collection_fast_creation_default[128] X1% ▎          milvus_client/test_milvus_client_collection.py
milvus_client/test_milvus_client_collection.py::TestMilvusClientCollectionValid.test_milvus_client_collection_fast_creation_default[2] X1% ▏         SS milvus_client/test_milvus_client_collection.py

I also built it on an Intel VM, the result was the same.

sumitd2 commented 1 month ago

@xiaofan-luan The 10 Azure related tests (same result on x86, some Azure tests are passing as well):

[  FAILED  ] AzureChunkManagerTest.WrongConfig
[  FAILED  ] AzureChunkManagerTest.AzureLogger
[  FAILED  ] AzureChunkManagerTest.BasicFunctions
[  FAILED  ] AzureChunkManagerTest.BucketPositive
[  FAILED  ] AzureChunkManagerTest.BucketNegtive
[  FAILED  ] AzureChunkManagerTest.ObjectExist
[  FAILED  ] AzureChunkManagerTest.WritePositive
[  FAILED  ] AzureChunkManagerTest.ReadPositive
[  FAILED  ] AzureChunkManagerTest.RemovePositive
[  FAILED  ] AzureChunkManagerTest.ListWithPrefixPositive
xiaofan-luan commented 1 month ago

I think we can xpass azure related test for now

sumitd2 commented 1 month ago

@xiaofan-luan Failing Golang tests (I started only minio and etcd containers, rest images aren't available):

FAIL    github.com/milvus-io/milvus/internal/datanode   16.504s
FAIL    github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/kafka    58.504s
FAIL    github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/pulsar   54.918s
FAIL    github.com/milvus-io/milvus/pkg/util/etcd       0.184s
FAIL    github.com/milvus-io/milvus/cmd/tools/config    0.615s
sumitd2 commented 1 month ago

Hi @xiaofan-luan @alexanderguzhva @locustbaby How would you like to proceed? I hope you now have access to the Power VM.

locustbaby commented 1 month ago

@sumitd2 Yeah, I can acess the VM by both SSH/Openstack, what's the next step I can help

sumitd2 commented 1 month ago

@xiaofan-luan I have observed the go tests are flaky. What can we do about the failing ones?

xiaofan-luan commented 4 weeks ago

@sumitd2

xiaofan-luan commented 4 weeks ago

can you offer the ci/cd logs so we can take a look?

Some of those are dependency related test and you need to bring up kafka/pulsar or etcd

xiaofan-luan commented 4 weeks ago

any way, please offer logs and we can take. a look.

sumitd2 commented 4 weeks ago

@xiaofan-luan Here is the output of make test-go :

milvus-v2.4.1-test-go.log docker-compose.yam.txt

sumitd2 commented 3 weeks ago

@xiaofan-luan Failing Golang tests (I started only minio and etcd containers, rest images aren't available):

FAIL    github.com/milvus-io/milvus/internal/datanode   16.504s
FAIL    github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/kafka    58.504s
FAIL    github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/pulsar   54.918s
FAIL    github.com/milvus-io/milvus/pkg/util/etcd       0.184s
FAIL    github.com/milvus-io/milvus/cmd/tools/config    0.615s
FAIL    github.com/milvus-io/milvus/pkg/mq/msgstream    600.287s
FAIL    github.com/milvus-io/milvus/pkg/mq/msgdispatcher        600.235s
FAIL    github.com/milvus-io/milvus/internal/storage    232.991s

@xiaofan-luan I have analyzed some of the failed ones:

  1. github.com/milvus-io/milvus/internal/datanode 16.504s This is flaky. I have seen it pass atleast once.

  2. github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/kafka 58.504s

  3. github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/pulsar 54.918s The fail because their images are not available.

  4. github.com/milvus-io/milvus/pkg/util/etcd 0.184s If I kill the etcd container, this one passes.

  5. github.com/milvus-io/milvus/cmd/tools/config 0.615s Looks like this one fails because because configs/milvus.yaml is not updated according to paramtable.

  6. github.com/milvus-io/milvus/internal/storage 232.991s This one is Azure related.

  7. github.com/milvus-io/milvus/pkg/mq/msgstream 600.287s Looks like this one also needs Kafka and Pulsar images.

sumitd2 commented 2 weeks ago

@xiaofan-luan FYI master is no longer building on power because the specified version of knowhere no longer builds on power

xiaofan-luan commented 2 weeks ago

@alexanderguzhva could you help on it?

xiaofan-luan commented 2 weeks ago

@locustbaby could you help verify all ut issue?

alexanderguzhva commented 2 weeks ago

@sumitd2 Would you be able to provide the build logs for the master branch, if possible?

sumitd2 commented 1 week ago

Hi @alexanderguzhva The logs are attached.

[make.log](https://github.com/user-attachments/files/15967563/make.log)
In file included from /sumit/milvus/cmake_build/thirdparty/knowhere/knowhere-src/thirdparty/faiss/faiss/impl/ScalarQuantizerDC_neon.cpp:9:
/sumit/milvus/cmake_build/thirdparty/knowhere/knowhere-src/thirdparty/faiss/faiss/impl/ScalarQuantizerCodec_neon.h:10:10: fatal error: arm_neon.h: No such file or directory
   10 | #include <arm_neon.h>
      |          ^~~~~~~~~~~~
compilation terminated.
make[3]: *** [thirdparty/knowhere/knowhere-build/CMakeFiles/faiss.dir/build.make:972: thirdparty/knowhere/knowhere-build/CMakeFiles/faiss.dir/thirdparty/faiss/faiss/impl/ScalarQuantizerDC_neon.cpp.o] Error 1
make[3]: *** Waiting for unfinished jobs....
alexanderguzhva commented 1 week ago

@sumitd2 my bad. I know what the problem is.