risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.06k stars 581 forks source link

[bug] Create iceberg sink on Minio now MUST requires `region` #19440

Open ClSlaid opened 3 days ago

ClSlaid commented 3 days ago

Describe the bug

According to the documentation from iceberg sink, creating iceberg sink only requires one of S3.endpoint or S3.region. However, even when S3.endpoint is specified, S3.region is still required.

Error message/log

on #7ba6650, create iceberg sink shows:

ERROR:  Failed to run the query

Caused by these errors (recent errors listed first):
  1: Sink error
  2: Iceberg error
  3: Unexpected => IO operation failed, source
  4: ConfigInvalid (permanent) at Builder::build, context: { service: s3 } => region is missing. Please find it by S3::detect_region() or set them in env.

To Reproduce

Deploy Iceberg on S3 with MinIO and Spark

create docker-compose file:

# docker-compose.yaml
version: "3"

services:
  spark-iceberg:
    image: tabulario/spark-iceberg
    container_name: spark-iceberg
    build: spark/
    networks:
      iceberg_net:
    depends_on:
      - rest
      - minio
    volumes:
      - ./warehouse:/home/iceberg/warehouse
      - ./notebooks:/home/iceberg/notebooks/notebooks
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1
    ports:
      - 8888:8888
      - 8080:8080
      - 10000:10000
      - 10001:10001
  rest:
    image: tabulario/iceberg-rest
    container_name: iceberg-rest
    networks:
      iceberg_net:
    ports:
      - 8181:8181
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1
      - CATALOG_WAREHOUSE=s3://warehouse/
      - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
      - CATALOG_S3_ENDPOINT=http://minio:9000
  minio:
    image: minio/minio
    container_name: minio
    environment:
      - MINIO_ROOT_USER=admin
      - MINIO_ROOT_PASSWORD=password
      - MINIO_DOMAIN=minio
    networks:
      iceberg_net:
        aliases:
          - warehouse.minio
    ports:
      - 9001:9001
      - 9000:9000
    command: ["server", "/data", "--console-address", ":9001"]
  mc:
    depends_on:
      - minio
    image: minio/mc
    container_name: mc
    networks:
      iceberg_net:
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1
    entrypoint: >
      /bin/sh -c "
      until (/usr/bin/mc config host add minio http://minio:9000 admin password) do echo '...waiting...' && sleep 1; done;
      /usr/bin/mc rm -r --force minio/warehouse;
      /usr/bin/mc mb minio/warehouse;
      /usr/bin/mc policy set public minio/warehouse;
      tail -f /dev/null
      "
networks:
  iceberg_net:

start the compose, and then enter the SQL shell.

docker-compose up
docker-compose exec -it spark-iceberg spark-sql
    CREATE TABLE IF NOT EXISTS demo.dev.sbtest1_sink (
        id BIGINT,
        k INT,
        c STRING,
        pad STRING
    ) TBLPROPERTIES (
        'format-version' = '2'
    );

Create Iceberg Sink

  1. start risingwave
    ./risedev dev
  2. connect to risingwave with psql
  3. run sql
    SET STREAMING_PARALLELISM=1;
    CREATE SINK IF NOT EXISTS sbtest1
    FROM sbtest1_sink 
    WITH (
        connector = 'iceberg',
        type = 'upsert',
        primary_key = 'id',
        warehouse.path ='s3a://warehouse',
        s3.endpoint = 'http://localhost:9000',
        s3.access.key = 'admin',
        s3.secret.key = 'password',
        catalog.name = 'demo',
        database.name='dev',
        table.name='sbtest1_sink',
    );

    Get error message:

    
    ERROR:  Failed to run the query

Caused by these errors (recent errors listed first): 1: Sink error 2: Iceberg error 3: Unexpected => IO operation failed, source 4: ConfigInvalid (permanent) at Builder::build, context: { service: s3 } => region is missing. Please find it by S3::detect_region() or set them in env.


### Expected behavior

Sink created successfully.

### How did you deploy RisingWave?

using risedev.
```bash
./risedev dev

The version of RisingWave

dev=> select version(); version

PostgreSQL 13.14.0-RisingWave-2.2.0-alpha (unknown) (1 row)

Additional context

No response