inquidia / PentahoSnowflakePlugin

Apache License 2.0
5 stars 10 forks source link

PUT speed #7

Open sbecainfo opened 7 years ago

sbecainfo commented 7 years ago

I've been testing this plugin and overall it works very well, thank you for developing it. My concern is with regard to the PUT process that it does upon uploading to Snowflake's internal staging (or table staging). It appears this is the slowest part of the whole bulk load and I'm wondering if there is a way to parallelize this?

I tried running multiple copies of the Snowflake Bulk Loader but this did not improve the overall loading speed. When testing a simple copy from an RDS MySQL datasource I can read at ~500-750k rows/sec whereas the write speed to SF goes down to ~60k rows/sec.

Perhaps you have other suggestions on how to improve bulk load speed to Snowflake?

sfc-gh-space commented 5 years ago

At the moment, Snowflake does not split a large file prior to loading. I believe rather than load 1 GB file, you should try to split them (10 100MB files could take better advantage of parallelism and the warehouse size you are loading under).