numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

2023-12-13 00:27:07,476 INFO [Executor task launch worker for task 577 #198

Open torvalds-dev-testbot[bot] opened 10 months ago

torvalds-dev-testbot[bot] commented 10 months ago

2023-12-13 00:27:07,477 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.common.util.collection.ExternalSpillableMap:Estimated Payload size => 2504
2023-12-13 00:27:07,478 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.common.util.collection.ExternalSpillableMap:New Estimated Payload size => 2845
2023-12-13 00:27:09,814 INFO [producer-thread-1] org.apache.hudi.common.util.queue.IteratorBasedQueueProducer:starting to buffer records
2023-12-13 00:27:09,821 INFO [consumer-thread-1] org.apache.hudi.common.util.queue.BoundedInMemoryExecutor:starting consumer thread
2023-12-13 00:27:09,855 INFO [producer-thread-1] org.apache.hudi.common.util.queue.IteratorBasedQueueProducer:starting to buffer records
2023-12-13 00:27:09,918 INFO [consumer-thread-1] org.apache.hudi.common.util.queue.BoundedInMemoryExecutor:starting consumer thread
2023-12-13 00:27:19,120 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.io.HoodieMergeHandle:Number of entries in MemoryBasedMap => 920285, Total size in bytes of MemoryBasedMap => 2618210917, Number of entries in BitCaskDiskMap => 0, Size of file spilled to disk => 0
2023-12-13 00:27:19,120 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.io.HoodieMergeHandle:partitionPath:tenant=aaaaaa/date=20231213, fileId to be merged:3d4538da-9810-445e-84ef-63b03719092b-0
2023-12-13 00:27:19,134 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.io.HoodieMergeHandle:Merging new data into oldPath <s3://some-s3-bucket/hudi/visibility=private/schema=scwx.process/tenant=aaaaaa/date=20231213/3d4538da-9810-445e-84ef-63b03719092b-0_616-45168-16228077_20231213002302278.parquet>, as newPath <s3://some-s3-bucket/hudi/visibility=private/schema=scwx.process/tenant=aaaaaa/date=20231213/3d4538da-9810-445e-84ef-63b03719092b-0_577-45181-16233208_20231213002634231.parquet>
2023-12-13 00:27:19,326 INFO [producer-thread-1] org.apache.hudi.common.util.queue.IteratorBasedQueueProducer:finished buffering records
2023-12-13 00:27:19,330 INFO [consumer-thread-1] org.apache.hudi.common.util.queue.BoundedInMemoryExecutor:Queue Consumption is done; notifying producer threads
2023-12-13 00:27:19,457 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.table.marker.DirectWriteMarkers:Creating Marker Path=<s3://some-s3-bucket/hudi/visibility=private/schema=scwx.process/.hoodie/.temp/20231213002634231/tenant=aaaaaa/date=20231213/3d4538da-9810-445e-84ef-63b03719092b-0_577-45181-16233208_20231213002634231.parquet.marker.MERGE>
2023-12-13 00:27:19,524 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.table.marker.DirectWriteMarkers:[direct] Created marker file <s3://some-s3-bucket/hudi/visibility=private/schema=scwx.process/.hoodie/.temp/20231213002634231/tenant=aaaaaa/date=20231213/3d4538da-9810-445e-84ef-63b03719092b-0_577-45181-16233208_20231213002634231.parquet.marker.MERGE> in 389 ms
2023-12-13 00:27:19,615 INFO [producer-thread-1] org.apache.hudi.common.util.queue.IteratorBasedQueueProducer:starting to buffer records
2023-12-13 00:27:19,616 INFO [consumer-thread-1] org.apache.hudi.common.util.queue.BoundedInMemoryExecutor:starting consumer thread
2023-12-13 00:27:25,880 INFO [producer-thread-1] org.apache.hudi.common.util.queue.IteratorBasedQueueProducer:finished buffering records
2023-12-13 00:27:25,880 INFO [consumer-thread-1] org.apache.hudi.common.util.queue.BoundedInMemoryExecutor:Queue Consumption is done; notifying producer threads
2023-12-13 00:27:32,378 INFO [producer-thread-1] org.apache.hudi.common.util.queue.IteratorBasedQueueProducer:finished buffering records
2023-12-13 00:27:32,381 INFO [consumer-thread-1] org.apache.hudi.common.util.queue.BoundedInMemoryExecutor:Queue Consumption is done; notifying producer threads
2023-12-13 00:29:13,455 INFO [Executor task launch worker for task 577.0 in stage 45181.0 (TID 16233208)] org.apache.hudi.io.HoodieMergeHandle:MergeHandle for partitionPath tenant=aaaaaa/date=20231213 fileID 3d4538da-9810-445e-84ef-63b03719092b-0, took 125979 ms.```
torvalds-dev[bot] commented 10 months ago

torvalds.dev is analyzing the ticket

torvalds-dev[bot] commented 10 months ago

The issue seems to be related to the Apache Hudi project, specifically the handling of data merging and compaction. The logs provided are from a Hudi process that is merging new data into an existing dataset.

Here's a brief summary of the logs:

The relevant files to this issue are:

  1. IOUtils.java: This file contains methods for calculating the maximum memory allowed for merge and compaction operations. It also contains methods for getting the maximum memory per partition merge and per compaction.

  2. TestSparkIOUtils.java: This file contains tests for the methods in IOUtils.java. It tests the calculation of maximum memory for merge and compaction operations with both default and defined maximum sizes.

Suggested actions:

  1. Check the memory configuration for your Hudi operations. Ensure that the maximum memory per partition merge and per compaction are set appropriately for your use case.

  2. Monitor the memory usage during Hudi operations. If the operations are running out of memory, you may need to increase the maximum memory settings.

  3. If the merge operation is taking too long, consider optimizing your data partitioning or increasing the resources available for the operation.

  4. Review the tests in TestSparkIOUtils.java to ensure they cover all possible scenarios and accurately test the functionality of IOUtils.java.

  5. If you're experiencing specific errors or issues, please provide more details or error messages.