tabular-io / iceberg-kafka-connect

Apache License 2.0
177 stars 32 forks source link

Out of memory error #159

Closed 13535048320 closed 8 months ago

13535048320 commented 8 months ago

1137197403ffe8c27ee03fdaf5a025c OOM will occur when sink a table with only 5000 pieces of data. jvm xmx10G.

Num of Table Columns: 290 Catalog: Hive metastore standalone Storage: Minio

key.converter: "org.apache.kafka.connect.storage.StringConverter" value.converter: "org.apache.kafka.connect.json.JsonConverter" topics: "MARA" iceberg.tables: "MARA" iceberg.tables.default-id-columns: "MATNR" iceberg.tables.cdc-field: "ODQ_CHANGEMODE" iceberg.catalog: "" iceberg.catalog.hive.catalog-impl: "org.apache.iceberg.hive.HiveCatalog" iceberg.hadoop.security.authorization: "false" iceberg.catalog.s3.secret-access-key: "" iceberg.catalog.s3.endpoint: "http://minio:9000" iceberg.catalog.io-impl: "org.apache.iceberg.aws.s3.S3FileIO" iceberg.catalog.client.region: "us-east-1" iceberg.catalog.uri: "thrift://hive.default.svc.cluster.local:9083" iceberg.hadoop.security.authentication: "simple" iceberg.catalog.warehouse: "s3a://warehouse" iceberg.catalog.s3.access-key-id: ""

bryanck commented 8 months ago

I can't say for sure what is going on. When you say "5000 pieces of data" do you mean records or files? You can try enabling metrics only for columns that need them, via the write.metadata.metrics.* table properties or increasing the number of tasks.

13535048320 commented 8 months ago

@bryanck Thank you for your reply. It mean records. I think it could be caused by too many partitions in iceberg, I have reduced the memory usage by reducing the number of partitions, thanks!