slingdata-io / sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
https://docs.slingdata.io
GNU General Public License v3.0
446 stars 34 forks source link

Add random string to temp directory name #409

Closed TimPossiblee closed 1 month ago

TimPossiblee commented 1 month ago

Feature Description

Right now when integrating sling into our orchestration there is a danger of race condition and data corruption happening. Because the directory name for the temporary files contains only the process time. There is a chance that the directory name is the same across multiple streams/invocations of sling and all write into the same file.

Maybe adding a random string to each directory would be an option to make sure everything stays separate.

Here is a example of the same file being uploaded to Snowflake into three different tables, because all three streams had the same directory name.

PUT 'file:///tmp/snowflake/put/2024-10-14T070736.758/part.01.0001.csv.zst' @CE_STG.sling_staging/"CE_STG"."Z_MIK_INVOICES_VEHICLES_HR_TMP"/2024-10-14T070736.758 PARALLEL=8 AUTO_COMPRESS=FALSE PUT 'file:///tmp/snowflake/put/2024-10-14T070736.758/part.01.0001.csv.zst' @CE_STG.sling_staging/"CE_STG"."Z_MIK_VEHICLE_REGISTERED_RS_TMP"/2024-10-14T070736.759 PARALLEL=8 AUTO_COMPRESS=FALSE PUT 'file:///tmp/snowflake/put/2024-10-14T070736.758/part.01.0001.csv.zst' @CE_STG.sling_staging/"CE_STG"."Z_MIK_VEHICLE_REGISTERED_ACA_TMP"/2024-10-14T070736.758 PARALLEL=8 AUTO_COMPRESS=FALSE

flarco commented 1 month ago

Done. Closing