singer-io / tap-doubleclick-campaign-manager

A Singer.io tap for extracting data from the DoubleClick Campaign Manager API
GNU Affero General Public License v3.0
4 stars 16 forks source link

Extraction fails because of a utf-8 decode error #17

Open louaig opened 3 years ago

louaig commented 3 years ago

I added an integration using stitch to the campaign manager and the extraction run constantly fails because of the following error: 'utf-8' codec can't decode byte 0xc1 in position 9: invalid start byte

Any idea?

louaig commented 3 years ago

So while I have no idea why this byte was there, I tried decoding using 'ISO-8859-1' and it worked. I'm not sure what the Implications are for changing it, but there could be two more solutions, one is ignoring unreadable bytes in utf-8 and the other is detecting the encoding and then encoding accordingly.

Edit: So the above didn't work because it was just skipping records, the problem was with parsing the xlsx file from the campaign manager API, I followed Google's documentation https://developers.google.com/doubleclick-advertisers/guides/download_reports#python_1 to download the xlsx file and convert it to CSV and then reading the CSV file line by line. I tried to convert it to CSV when getting the chunks without writing to a temp file but it didn't work, I guess xls files aren't meant to be consumed that way.