inquidia / PentahoSnowflakePlugin

Apache License 2.0
5 stars 10 forks source link

Snowflake Bulk Loader crashes when run in parallel #16

Open KoenVerbeeck opened 6 years ago

KoenVerbeeck commented 6 years ago

When I load a table in parallel with the bulk loader by setting the "change number of copies to start" in Pentaho, the transformation crashes. The bulk loader successfully writes all the data to the .gz files in the temp directory. Then it starts pushing those temp files to Snowflake. Typically, 1 or more copies can upload their files, but then the log says "Unable to delete temp file". All the other copies then hang, doing nothing, until the transformation crashes a few minutes later. The error messages:

Error putting file to Snowflake stage AWS operation failed: Operation=upload, Error message=More data read than expected: dataLength=11442688; expectedLength=11442272;

Brochm commented 5 years ago

Just had the same problem. I resolved the issue by first appending relevant data inside Pentaho (using select value) then doing a single bulk load to the table. I suspect that multiple bulk load pointing to the same table under the same transformation can cause files conflict (just an hypothesis)