xitongsys / parquet-go

pure golang library for reading/writing parquet file
Apache License 2.0
1.27k stars 293 forks source link

Marshal nodes directly onto stack #480

Closed zolstein closed 2 years ago

zolstein commented 2 years ago

marshal.Marshal walks the object structure of each element by looking at each node and, if it's non-terminal, pushing its children onto a stack. However, the Marshaler's interface for generating child nodes requires that each step allocate a slice of nodes to return, which Marshal then appends to the stack. This process is repeated for each element in the parquet file, causing many allocations.

By changing the Marshaler interface to accept the stack as an argument and return the whole stack rather than the new nodes, it can push the child nodes directly into the stack, avoiding the need to allocate a slice in each function.


This change reduces the number of individual allocations performed when writing large parquet files by nearly half, in my simple test of a struct with a couple of members. (Ignoring allocation done in the test code for generating objects.)

Object allocation profile for old code ![allocs_ln_objs](https://user-images.githubusercontent.com/7101542/177424696-7b069358-f0b0-49b3-8c5b-23f332d32972.svg)
Object allocation profile for new code ![allocs_ln_objs_stack](https://user-images.githubusercontent.com/7101542/177424808-fc37d47f-3130-4c77-a345-77d4946d6807.svg)