xitongsys / parquet-go

pure golang library for reading/writing parquet file
Apache License 2.0
1.25k stars 294 forks source link

Buffered Reading Memory Leak #506

Open lizzy-nammour opened 1 year ago

lizzy-nammour commented 1 year ago

Hi @xitongsys! I've been running into memory issues when reading very few rows at a time from parquet files. I've tried both using ParquetReader and ParquetColumnarReader on a file with 500k rows and I've been printing heap usage, and at the first call of readColumnByIndex to read 10 rows, the memory spikes up to 1gb. Any help would be greatly appreciated

    file, err := source.NewLocalFileReader("test.parquet")
    parser, err := reader.NewParquetReader(source, nil, 1)
    for columnIndex := int64(0); columnIndex < parquetReader.SchemaHandler.GetColumnNum(); columnIndex++ {
            values, _, _, err := parquetReader.ReadColumnByIndex(columnIndex, 10)
         }
    file, err := source.NewLocalFileReader("test.parquet")
    parser, err := reader.NewParquetReader(source, nil, 1)
    values, err := parquetReader.ReadByNumber(10)           

Thank you so much for your help!