transferwise / pipelinewise-tap-s3-csv

Singer.io Tap for CSV files on S3 - PipelineWise compatible
https://transferwise.github.io/pipelinewise/
GNU Affero General Public License v3.0
7 stars 33 forks source link

Merging in my enhancements to this TAP - Is this feasible - do you have time? #210

Open s7clarke10 opened 1 year ago

s7clarke10 commented 1 year ago

Hi,

I have a fork of this tap and have continued to enhance the tap to include additional features that we need. We also supported switching off the discovery of the data types and just making all the extracted fields strings. Further we have added other features like support for BOM, excluding going through proxy servers for Private S3 bucket access, specifying the encoding of the file etc.

Given your recent changes to the tap, I'm not sure how feasible it is to push through some of these changes and so wanted your thoughts on this? I have limited time to push these changes through and know I had challenges with my last pull request because of the ci/cd and testing with the buckets.

Here are some of our recent changes in this fork https://github.com/s7clarke10/pipelinewise-tap-s3-csv .

Some of these enhancements needed to be made in conjunction with the singer encodings https://github.com/s7clarke10/singer-encodings enhancements.

Would appreciate your thoughts on this.

2.0.8 (2022-12-22)

Changes

2.0.7 (2022-11-01)

Changes

2.0.6 (2022-10-05)

Changes

2.0.5 (2022-10-04)

The tap-s3-csv enhancements deal with scenarios where the csv files are not loading correctly due to various quality issues or assumption about the data being read e.g. data-types.

Changes