kestra-io / plugin-serdes

https://kestra.io/plugins/plugin-serdes/
Apache License 2.0
2 stars 5 forks source link

Feat/improve perf #105

Closed loicmathieu closed 1 month ago

loicmathieu commented 5 months ago

This PR use the new methods on the FileSerde to improve the performance of the read/write of files.

Fixes #102

Here are some results running locally with the Google Cloud Storage internal storage implementation. The test read an ION file and wrote a file in the new format (scenario 'write'), then read the new format and write an ION fle (scenario read). Doing this for two kind of files:

Excel and XMLS use fewer rows as they load the data in memory (small 100K and big 10K rows).

Format Test Before (s) After (s)
Avro write - small 11.7s 5.6s
Avro read - small 6.6s 6.7s
CSV write - small 12.5s 5.2s
CSV read - small 6.8s 4.9s
CSV write - big 14.8s 14.7s
CSV read - bg 15.2s 14.9s
Excel write - small 2.4s 1.9s
Excel write - big 5s 5s
JSON write - small 13.2s 16.6s
JSON read - small 7.1s 6.1s
JSON write - big 23.1s 21.7s
JSON read - bg 20.5s 19.3s
Parquet write - small 14.6s 7.3s
Parquet read - small 8.4s 5.8s
XML write - small 2.2s 2.3s
XML read - small 1.6s 1.6s
XML write - big 5s 4.9s
XML read - bg 8.1s 8s