I have a fork of this tap and have continued to enhance the tap to include additional features that we need. We also supported switching off the discovery of the data types and just making all the extracted fields strings. Further we have added other features like support for BOM, excluding going through proxy servers for Private S3 bucket access, specifying the encoding of the file etc.
Given your recent changes to the tap, I'm not sure how feasible it is to push through some of these changes and so wanted your thoughts on this? I have limited time to push these changes through and know I had challenges with my last pull request because of the ci/cd and testing with the buckets.
Providing an optional set_empty_values_null setting. When set true will emit null (the JSON equivalent of None) instead of an empty string.
2.0.7 (2022-11-01)
Changes
Providing an optional s3_proxies dict config to set the use of a proxy server. Set to {} to avoid using a proxy server for s3 traffic.
2.0.6 (2022-10-05)
Changes
Bump boto3 from 1.23.10 to 1.24.26
Bump ujson from 5.2.0 to 5.4.0 because of vunerabilities
2.0.5 (2022-10-04)
The tap-s3-csv enhancements deal with scenarios where the csv files are not loading correctly due to various quality issues or assumption about the data being read e.g. data-types.
Changes
Allows strings to be overridden to have a string data-type regardless of what has been discovered
Supports the reading of UTF-8-BOM (Byte Order) - Microsoft saved csv files
Support a suffix being added to streams / tables to make them unique e.g. a date or provider_id
Provides option to warn rather error if a file isn't discovered for the search criteria
Support the ability to remove a character from the csv file being read e.g. strip out all double-quotes.
Hi,
I have a fork of this tap and have continued to enhance the tap to include additional features that we need. We also supported switching off the discovery of the data types and just making all the extracted fields strings. Further we have added other features like support for BOM, excluding going through proxy servers for Private S3 bucket access, specifying the encoding of the file etc.
Given your recent changes to the tap, I'm not sure how feasible it is to push through some of these changes and so wanted your thoughts on this? I have limited time to push these changes through and know I had challenges with my last pull request because of the ci/cd and testing with the buckets.
Here are some of our recent changes in this fork https://github.com/s7clarke10/pipelinewise-tap-s3-csv .
Some of these enhancements needed to be made in conjunction with the singer encodings https://github.com/s7clarke10/singer-encodings enhancements.
Would appreciate your thoughts on this.
2.0.8 (2022-12-22)
Changes
2.0.7 (2022-11-01)
Changes
2.0.6 (2022-10-05)
Changes
2.0.5 (2022-10-04)
The tap-s3-csv enhancements deal with scenarios where the csv files are not loading correctly due to various quality issues or assumption about the data being read e.g. data-types.
Changes