mergestat / syncs

MergeStat container based syncs
MIT License
5 stars 10 forks source link

Git-files: Removing invalid UTF-8 characters #55

Closed SimonFlarup closed 1 year ago

SimonFlarup commented 1 year ago

Removes invalid character sequences from the files.csv file prior to loading it into the UTF-8 encoded postgres schema

Fixes #43


Notes

I have been struck with the UTF-8 encoding error returned by Postgres a couple of times. I've forked the git-files sync docker image and applied this patch to fix the issue.

The same fix can be applied to git-blame

It does result in a loss of information, since the invalid characters are just removed

Feel free to do with this commit as you please. I am just providing it as an example of how I solved the issue.

amenowanna commented 1 year ago

@SimonFlarup I wanted to make you aware we still ran into issues with encoding and the linked issue/pr is what we are using to address it.