Out of memory error - Githubissues

13535048320 commented 8 months ago

1137197403ffe8c27ee03fdaf5a025c OOM will occur when sink a table with only 5000 pieces of data. jvm xmx10G.

Num of Table Columns: 290 Catalog: Hive metastore standalone Storage: Minio

key.converter: "org.apache.kafka.connect.storage.StringConverter" value.converter: "org.apache.kafka.connect.json.JsonConverter" topics: "MARA" iceberg.tables: "MARA" iceberg.tables.default-id-columns: "MATNR" iceberg.tables.cdc-field: "ODQ_CHANGEMODE" iceberg.catalog: "" iceberg.catalog.hive.catalog-impl: "org.apache.iceberg.hive.HiveCatalog" iceberg.hadoop.security.authorization: "false" iceberg.catalog.s3.secret-access-key: "" iceberg.catalog.s3.endpoint: "http://minio:9000" iceberg.catalog.io-impl: "org.apache.iceberg.aws.s3.S3FileIO" iceberg.catalog.client.region: "us-east-1" iceberg.catalog.uri: "thrift://hive.default.svc.cluster.local:9083" iceberg.hadoop.security.authentication: "simple" iceberg.catalog.warehouse: "s3a://warehouse" iceberg.catalog.s3.access-key-id: ""

bryanck commented 8 months ago

I can't say for sure what is going on. When you say "5000 pieces of data" do you mean records or files? You can try enabling metrics only for columns that need them, via the write.metadata.metrics.* table properties or increasing the number of tasks.

13535048320 commented 8 months ago

@bryanck Thank you for your reply. It mean records. I think it could be caused by too many partitions in iceberg, I have reduced the memory usage by reducing the number of partitions, thanks!

tabular-io / iceberg-kafka-connect

Out of memory error #159