zilliztech / milvus-backup

Backup and restore tool for Milvus
Apache License 2.0
110 stars 38 forks source link

[Bug]: milvus-backup using metadata instead of aws credentials or role #295

Open mcandio opened 4 months ago

mcandio commented 4 months ago

Current Behavior

My current milvus infrastructure is: eks helm standard deployment with IRSA role I use s3 as the storage endpoint seems like the tool is not able to use exported aws credentials or the config file even if useiam is specified. i am running : kubectl -n stride-tutoring port-forward --address 0.0.0.0 service/milvus-default 19530:19530 and I have also configured an ingress, when I use a jumphost with a role attached, the tool is able to run smoothly, but when I try to use my local computer, it is not working by exporting the env vars, (secret key, keyid, token)

my config file:

`log:
  console: true
  file:
    rootPath: logs/backup.log
  level: info
milvus:
  address: 0.0.0.0
  authorizationEnabled: false
  password: Milvus
  port: 19530
  tlsMode: 0
  user: root
minio:
  accessKeyID: ""
  address: s3.us-east-1.amazonaws.com
  backupBucketName: redacted
  backupRootPath: backup
  bucketName: redacted
  iamEndpoint: ''"
  port: 443
  rootPath: file
  secretAccessKey: null 
  storageType: s3
  useIAM: true
  useSSL: true
~/Downloads/milvus-collections on ☁️  (us-east-1) took 10s 
❯ ./milvus-backup --config stride-aitutor-data-ci/config.yaml check
0.4.6 (Built on 2024-01-22T02:30:59Z from Git SHA 7845b38f2b2e613fd85ea3da8fa614045047ac2b)
config:stride-aitutor-data-ci/config.yaml
[2024/02/09 13:37:30.079 -06:00] [INFO] [logutil/logutil.go:165] ["Log directory"] [configDir=]
[2024/02/09 13:37:30.081 -06:00] [INFO] [logutil/logutil.go:166] ["Set log file to "] [path=logs/backup.log]
[2024/02/09 13:37:30.446 -06:00] [WARN] [storage/minio_chunk_manager.go:104] ["failed to check blob bucket exist"] [bucket=redacted] [error="Get \"http://169.254.169.254/latest/meta-data/iam/security-credentials/\": dial tcp 169.254.169.254:80: connect: host is down"]
[2024/02/09 13:37:30.653 -06:00] [WARN] [storage/minio_chunk_manager.go:104] ["failed to check blob bucket exist"] [bucket=redacted] [error="Get \"http://169.254.169.254/latest/meta-data/iam/security-credentials/\": dial tcp 169.254.169.254:80: connect: host is down"]

It is also not working if I use the following config file and if I export my aws envs:


`# Configures the system log output.
log:
  level: debug # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
  console: true # whether print log to console
  file:
    rootPath: "logs/backup.log"

http:
  simpleResponse: true

# milvus proxy address, compatible to milvus.yaml
milvus:
  address: 0.0.0.0
  port: 19530
  authorizationEnabled: false
  # tls mode values [0, 1, 2]
  # 0 is close, 1 is one-way authentication, 2 is two-way authentication.
  tlsMode: 0
  user: "root"
  password: "Milvus"

# Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
  # cloudProvider: "minio" # deprecated use storageType instead
  storageType: "s3" # support storage type: local, minio, s3, aws, gcp, ali(aliyun), azure

  address: s3.amazonaws.com # Address of MinIO/S3
  port: 443   # Port of MinIO/S3
  accessKeyID:   # accessKeyID of MinIO/S3
  secretAccessKey:  # MinIO/S3 encryption string
  useSSL: true # Access to MinIO/S3 with SSL
  useIAM: false
  iamEndpoint: ""

  bucketName: "redacted" # Milvus Bucket name in MinIO/S3, make it the same as your milvus instance
  rootPath: "file" # Milvus storage root path in MinIO/S3, make it the same as your milvus instance

  # only for azure
  backupAccessKeyID: minioadmin  # accessKeyID of MinIO/S3
  backupSecretAccessKey: minioadmin # MinIO/S3 encryption string

  backupBucketName: "redacted" # Bucket name to store backup data. Backup data will store to backupBucketName/backupRootPath
  backupRootPath: "backup" # Rootpath to store backup data. Backup data will store to backupBucketName/backupRootPath

backup:
  maxSegmentGroupSize: 2G

  parallelism:
    # collection level parallelism to backup
    backupCollection: 4
    # thread pool to copy data. reduce it if blocks your storage's network bandwidth
    copydata: 128
    # Collection level parallelism to restore
    restoreCollection: 2

  # keep temporary files during restore, only use to debug
  keepTempFiles: false`

the error is:


`git:(feature/milvus-ci-tool) ✗ ./milvus-backup --config config.yaml check
0.4.6 (Built on 2024-01-22T02:30:59Z from Git SHA 7845b38f2b2e613fd85ea3da8fa614045047ac2b)
config:config.yaml
[2024/02/09 20:37:56.435 +00:00] [INFO] [logutil/logutil.go:165] ["Log directory"] [configDir=]
[2024/02/09 20:37:56.436 +00:00] [INFO] [logutil/logutil.go:166] ["Set log file to "] [path=logs/backup.log]
[2024/02/09 20:37:56.436 +00:00] [DEBUG] [core/backup_context.go:63] ["Start Milvus client"] [endpoint=0.0.0.0:19530]
[2024/02/09 20:37:57.146 +00:00] [DEBUG] [core/backup_context.go:87] ["Start minio client"] [address=s3.amazonaws.com:443] [bucket=redacted] [backupBucket=redacted]
[2024/02/09 20:37:57.931 +00:00] [WARN] [storage/minio_chunk_manager.go:104] ["failed to check blob bucket exist"] [bucket=redacted] [error="Access Denied."]
[2024/02/09 20:37:57.932 +00:00] [DEBUG] [retry/retry.go:39] ["retry func failed"] ["retry time"=0] [error="Access Denied."]

can someone help me understand the default behaviour when no instance role is attached? I mean when assuming role or using local aws credentials? this approach also does not work when using .aws/credentials file

thanks!

wayblink commented 4 months ago

@mcandio Hi, I guess you get some trouble about IAM. To be frank, IAM support is not fully tested before. We need some time to verify. I am curious about this message [error="Get \"http://169.254.169.254/latest/meta-data/iam/security-credentials/\": dial tcp 169.254.169.254:80: connect: host is down"] Have you check your IAM server?

mcandio commented 3 months ago

@mcandio Hi, I guess you get some trouble about IAM. To be frank, IAM support is not fully tested before. We need some time to verify. I am curious about this message [error="Get \"http://169.254.169.254/latest/meta-data/iam/security-credentials/\": dial tcp 169.254.169.254:80: connect: host is down"] Have you check your IAM server?

Hi @wayblink , sorry for the late response. connect: host is down" show up because the my localhost is not an AWS server. The server is running and in fact, it works if I try to run the backup tool from an aws instance that has the instance metadata available at 169.254.169.254. This error is showing when I run the backup tool locally, with exported AWS envs like:

export AWS_ACCESS_KEY_ID=<redacted>
export AWS_SECRET_ACCESS_KEY=<redacted>
export AWS_SESSION_TOKEN=<redacted>

This behaviour is not only limited to these exported env vars, the tool is not recognising the .aws/config params too. it is only trying to fetch the metadata and make use of it but of course, it is not there.

If you have any insight about this, it would be great, we ended up creating a big python API to handle backups but we need to understand what is the expected behaviour. Thanks!