scality / spark

Apache License 2.0
3 stars 0 forks source link

Update config-template.yml #21

Closed scality-gelbart closed 1 year ago

ghost commented 2 years ago

@scality-gelbart Have you run a complete set of s3_fsck_p0.py through s3_fsck_p4.py and confirmed there are zero errors from this change?

I am hesitant because I had to resolve double slashes and use os.path.join for some parts of code. Specifically the file vs. s3a protocols. Filesystems don't care, but S3 (s3a pyspark module at least) does not handle well "double prefix delimiters". Meaning when s3://bucketname/prefix1/prefix2/ contains files the path s3://bucketname/prefix1//prefix2/ does not contain files. I'm pretty confident URL's should be handled like Filesystems in most cases, but want to confirm this is well tested before merging.