Open shivakumarss opened 1 year ago
This bug is related to https://github.com/xitongsys/parquet-go/issues/241. I have multiple writers open and flush is being called internally writer.go when ObjsSize is more than criSize
if pw.CheckSizeCritical <= ln {
pw.ObjSize = (pw.ObjSize+common.SizeOf(val))/2 + 1
}
pw.ObjsSize += pw.ObjSize
pw.Objs = append(pw.Objs, src)
criSize := pw.NP * pw.PageSize * pw.SchemaHandler.GetColumnNum()
if pw.ObjsSize >= criSize {
err = pw.Flush(false)
I am facing a performance issue, for me the requirement is different types of requests being received concurrently via REST and for each type, i have a different schema.
On the implementation level, i have an internal map containing key as
source.ParquetFile
and value as*writer.ParquetWriter.
This is internally maintained in a map, based on the request type I first query
source.ParquetFile
, if not present I will create a one and then create a*writer.ParquetWriter,
write the data and put it back in the map.On average, we receive around 30 - 40 k requests per Second/machine, which includes both types of requests and request types will be more going forward.
With less load it works fine, but as and when the load is increased below exception is thrown. I believe I am doing something fundamentally wrong with multiple writers and need assistance here.
Writer configuration
EDIT 1 : Go lang version