numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

2023-12-13 00:27:07,476 INFO [Executor task launch worker for task 577 #210

Open torvalds-dev-testbot[bot] opened 10 months ago

torvalds-dev-testbot[bot] commented 10 months ago

2023-12-13 00:27:07,477 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.common.util.collection.ExternalSpillableMap:Estimated Payload size => 2504
2023-12-13 00:27:07,478 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.common.util.collection.ExternalSpillableMap:New Estimated Payload size => 2845
2023-12-13 00:27:09,814 INFO [producer-thread-1] org.apache.hudi.common.util.queue.IteratorBasedQueueProducer:starting to buffer records
2023-12-13 00:27:09,821 INFO [consumer-thread-1] org.apache.hudi.common.util.queue.BoundedInMemoryExecutor:starting consumer thread
2023-12-13 00:27:09,855 INFO [producer-thread-1] org.apache.hudi.common.util.queue.IteratorBasedQueueProducer:starting to buffer records
2023-12-13 00:27:09,918 INFO [consumer-thread-1] org.apache.hudi.common.util.queue.BoundedInMemoryExecutor:starting consumer thread
2023-12-13 00:27:19,120 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.io.HoodieMergeHandle:Number of entries in MemoryBasedMap => 920285, Total size in bytes of MemoryBasedMap => 2618210917, Number of entries in BitCaskDiskMap => 0, Size of file spilled to disk => 0
2023-12-13 00:27:19,120 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.io.HoodieMergeHandle:partitionPath:tenant=aaaaaa/date=20231213, fileId to be merged:3d4538da-9810-445e-84ef-63b03719092b-0
2023-12-13 00:27:19,134 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.io.HoodieMergeHandle:Merging new data into oldPath <s3://some-s3-bucket/hudi/visibility=private/schema=scwx.process/tenant=aaaaaa/date=20231213/3d4538da-9810-445e-84ef-63b03719092b-0_616-45168-16228077_20231213002302278.parquet>, as newPath <s3://some-s3-bucket/hudi/visibility=private/schema=scwx.process/tenant=aaaaaa/date=20231213/3d4538da-9810-445e-84ef-63b03719092b-0_577-45181-16233208_20231213002634231.parquet>
2023-12-13 00:27:19,326 INFO [producer-thread-1] org.apache.hudi.common.util.queue.IteratorBasedQueueProducer:finished buffering records
2023-12-13 00:27:19,330 INFO [consumer-thread-1] org.apache.hudi.common.util.queue.BoundedInMemoryExecutor:Queue Consumption is done; notifying producer threads
2023-12-13 00:27:19,457 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.table.marker.DirectWriteMarkers:Creating Marker Path=<s3://some-s3-bucket/hudi/visibility=private/schema=scwx.process/.hoodie/.temp/20231213002634231/tenant=aaaaaa/date=20231213/3d4538da-9810-445e-84ef-63b03719092b-0_577-45181-16233208_20231213002634231.parquet.marker.MERGE>
2023-12-13 00:27:19,524 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.table.marker.DirectWriteMarkers:[direct] Created marker file <s3://some-s3-bucket/hudi/visibility=private/schema=scwx.process/.hoodie/.temp/20231213002634231/tenant=aaaaaa/date=20231213/3d4538da-9810-445e-84ef-63b03719092b-0_577-45181-16233208_20231213002634231.parquet.marker.MERGE> in 389 ms
2023-12-13 00:27:19,615 INFO [producer-thread-1] org.apache.hudi.common.util.queue.IteratorBasedQueueProducer:starting to buffer records
2023-12-13 00:27:19,616 INFO [consumer-thread-1] org.apache.hudi.common.util.queue.BoundedInMemoryExecutor:starting consumer thread
2023-12-13 00:27:25,880 INFO [producer-thread-1] org.apache.hudi.common.util.queue.IteratorBasedQueueProducer:finished buffering records
2023-12-13 00:27:25,880 INFO [consumer-thread-1] org.apache.hudi.common.util.queue.BoundedInMemoryExecutor:Queue Consumption is done; notifying producer threads
2023-12-13 00:27:32,378 INFO [producer-thread-1] org.apache.hudi.common.util.queue.IteratorBasedQueueProducer:finished buffering records
2023-12-13 00:27:32,381 INFO [consumer-thread-1] org.apache.hudi.common.util.queue.BoundedInMemoryExecutor:Queue Consumption is done; notifying producer threads
2023-12-13 00:29:13,455 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.io.HoodieMergeHandle:MergeHandle for partitionPath tenant=aaaaaa/date=20231213 fileID 3d4538da-9810-445e-84ef-63b03719092b-0, took 125979 ms.```
torvalds-dev[bot] commented 10 months ago

torvalds.dev is analyzing the ticket

torvalds-dev[bot] commented 10 months ago

The issue seems to be related to the Apache Hudi project, specifically the handling of data merging and compaction. The logs provided are from a Hudi job that is merging new data into an existing dataset.

Here's a brief summary of the logs:

The files you provided are part of the Hudi project and are related to the issue:

  1. IOUtils.java: This file contains utility methods for handling I/O operations in Hudi. The methods getMaxMemoryPerPartitionMerge and getMaxMemoryPerCompaction are used to calculate the maximum memory that can be used for merge and compaction operations respectively.

  2. TestSparkIOUtils.java: This file contains tests for the methods in IOUtils.java. The tests ensure that the methods correctly calculate the maximum memory for merge and compaction operations.

Based on the logs and the provided files, it seems like the issue might be related to memory management during merge and compaction operations. If there's a problem, it could be due to the way the maximum memory for these operations is being calculated or allocated.

Here are some actions you could take: