treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data
https://docs.lakefs.io
Apache License 2.0
4.36k stars 346 forks source link

lakeFS configuration for Aliyun S3 #3614

Closed icecraft closed 2 years ago

icecraft commented 2 years ago

Hi @icecraft, Sorry to hear you've been having difficulties. Thanks for taking Aliyun S3 for a spin with lakeFS! There may be a few adjustments, I will try to help.

Could you try setting blockstore.s3.force_path_style to true in your config file? The configuration reference has a bunch of variables that can help, all under "block.s3". Good luck, and please let us know how you get along!

Originally posted by @arielshaqed in https://github.com/treeverse/lakeFS/issues/3490#issuecomment-1152913865

icecraft commented 2 years ago

Sorry to later reply. I have set the blockstore.s3.force_path_style to true, but it does not have effects that i want . here is my config


    blockstore:
        type: s3
        s3:
            region: "cn-shanghai"
            endpoint: "https://oss-cn-shanghai.aliyuncs.com"
            credentials:
                access_key_id: "xxxxx"
                secret_access_key: "xxx"
            force_path_style: true
            discover_bucket_region: false

    committed:
        block_storage_prefix: lakefs-test/meta

when I try creating new repo, some error occurs

time="2022-07-05T06:10:55Z" level=warning msg="Could not access storage namespace" func="pkg/api.(*Controller).CreateRepository" file="build/pkg/api/controller.go:1221" error="s3 error: <?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error>\n  <Code>SecondLevelDomainForbidden</Code>\n  <Message>Please use virtual hosted style to access.</Message>\n  <RequestId>62C3D5EFE179793636E40AC6</RequestId>\n  <HostId>oss-cn-shanghai.aliyuncs.com</HostId>\n</Error>\n" service=api_gateway storage_namespace="s3://my-data"

time="2022-07-05T06:10:55Z" level=error msg="bad S3 PutObject response" func="pkg/block/s3.(*Adapter).streamToS3" file="build/pkg/block/s3/adapter.go:250" error="s3 error: <?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error>\n  <Code>SecondLevelDomainForbidden</Code>\n  <Message>Please use virtual hosted style to access.</Message>\n  <RequestId>62C3D5EFE1797936360C0EC6</RequestId>\n  <HostId>oss-cn-shanghai.aliyuncs.com</HostId>\n</Error>\n" host="127.0.0.1:2002" method=POST operation=PutObject path=/api/v1/repositories request_id=298c66aa-f7a4-4685-9033-36ef8213bd47 service_name=rest_api status_code=403 url="https://oss-cn-shanghai.aliyuncs.com/my-data/dummy"
arielshaqed commented 2 years ago

Bad news, I'm afraid.

I tried essentially the same but in a different Aliyun region (Frankfurt, eu-central-1). I get a different failure:

ERROR  [2022-07-05T10:37:57+03:00]pkg/block/s3/adapter.go:250 pkg/block/s3.(*Adapter).streamToS3 bad S3 PutObject response                     error="s3 error: <?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error>\n  <Code>NotImplemented</Code>\n  <Message>Aws MultiChunkedEncoding is not supported.</Message>\n  <RequestId>62C3EA5594D6E93131C82545</RequestId>\n  <HostId>oss-eu-central-1.aliyuncs.com</HostId>\n</Error>\n" host="localhost:8000" log_audit=API method=POST operation=PutObject path=/api/v1/repositories request_id=fc0edb3c-1917-42b4-a648-b0b71797d458 service_name=rest_api status_code=400 url="https://oss-eu-central-1.aliyuncs.com/treeverse-ariels-test/repos/example/dummy"
WARNING[2022-07-05T10:37:57+03:00]pkg/api/controller.go:1297 pkg/api.(*Controller).CreateRepository Could not access storage namespace            error="s3 error: <?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error>\n  <Code>NotImplemented</Code>\n  <Message>Aws MultiChunkedEncoding is not supported.</Message>\n  <RequestId>62C3EA5594D6E93131C82545</RequestId>\n  <HostId>oss-eu-central-1.aliyuncs.com</HostId>\n</Error>\n" reason=unknown service=api_gateway storage_namespace="s3://treeverse-ariels-test/repos/example"

This (support for chunked encoding) will currently be a blocker.

I shall try to see if I can do better in cn-shanghai. Cloud providers often provide different features in their different regions.

arielshaqed commented 2 years ago

Sorry, I cannot create buckets inside the PRC:

Error Prompt
In compliance with the Peoples Republic of China (PRC) laws, purchasers of Internet related products offered in a region inside Mainland China are required to provide real-name registration information.
Error Code
CREATE_BUCKET_LIMIT
Request ID
45bb765a-e557-4e13-85e3-a37556219706

In compliance with the Peoples Republic of China (PRC) laws, purchasers of Internet related products offered in a region inside Mainland China are required to provide real-name registration information.

arielshaqed commented 2 years ago

Options

  1. Keep going with debugging-at-a-distance. If you agree, I will ask you to remove force_path_style: true and see if that works (thanks, @itaiad200 !).

    Following that we would have to switch to using a profile (so set blockstore.s3.profile instead of credentials). Then we will configure the profile to use sigV4, by adding to $HOME/.aws/config:

    [profile aliyun]
    s3 =
     signature_version = s3v4

    (with "aliyun" replaced by the name of your profile) and try that.

  2. Obtain access to a bucket inside the PRC. Frankly I am not sure how I would do that.
  3. Is there anywhere that I could receive technical information about details of Aliyun OSS? Do you have technical support contacts inside Aliyun who could contact me?
arielshaqed commented 2 years ago

Hi @icecraft ,

I opened an account and corresponded with Aliyun (with a service consultant manager). They helped me communicate with technical support. Not sure whether the following is good news for you or not:

"I have checked the S3 instruction of installing lakeFS from your docs, it mentioned S3 Virtual-Host addressing. Aliyun OSS also supports Virtual-Host address. However, only Japan node of Aliyun OSS supports that by default, other regions require application to enable the function."

I checked, and indeed I can work with a bucket in Japan (endpoint oss-ap-northeast-1.aliyuncs.com).

Given that it still does not work even if we force path-style (the opposite of "virtual-host addressing"!) on other regions, I am not sure that this answer is complete. Unfortunately my ability to investigate is limited by these two factors:

  1. Aliyun will not let me open a bucket inside CN.
  2. Language issues make it hard for me to communicate directly with Aliyun tech support.

I shall attempt to continue. Any community members who are experienced with Aliyun - I would be very grateful for any hints or help!

arielshaqed commented 2 years ago

Hi @icecraft ,

Closing this issue as I do not believe we will be able to proceed without your involvement. Please re-open when you have time to proceed!

For further reference: The Aliyun Japan endpoint works. I was unable to determine whether the difference was due to sigV2/sigV4 or path-based vs. host-based addressing.