jensl / critic

Critic code review system.
Other
385 stars 58 forks source link

githook.py UnicodeDecodeError: 'utf8' codec can't decode byte #125

Open vkpgwt opened 6 years ago

vkpgwt commented 6 years ago

Hello!

Sometimes I get error messages from critic to my email when it's updating a review from a tracked branch in a remote repository after someone pushed into that branch:

2018-02-16 12:06:51,250 - ERROR - UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 158: invalid continuation byte

Request:
{
  "repository_name": "myrepo", 
  "refs": [
    {
      "new_sha1": "9fb7a24f9e14252f10876d27b61414e095371cd3", 
      "name": "refs/heads/r/myreview", 
      "old_sha1": "d59a88fc7f5f7a57faf988a300bdbe20d1874fd7"
    }
  ], 
  "user_name": "critic", 
  "flags": "trackedbranch_id=39"
}

Traceback (most recent call last):
  File "/usr/share/critic/background/githook.py", line 170, in slave
    sys_stdout.write(json_encode({ "status": "ok", "accept": True, "output": sys.stdout.getvalue(), "info": info }))
  File "/usr/lib/python2.7/json/__init__.py", line 244, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 158: invalid continuation byte

Critic version: 28ed20bb8032d7cc5aa23de98da51e619fd84164
Critic bug reports can be filed here: https://github.com/jensl/critic/issues/new

And then:

2018-02-16 12:06:51,441 - ERROR - update of branch r/myreview from mybranch in http://critic:critic@repository.example.com/myrepo failed
    remote: An exception was raised while processing the request.  A message has
    remote: been sent to the system administrator(s).
    To /var/git/myrepo.git
     ! [remote rejected] source/mybranch -> r/myreview (pre-receive hook declined)
    error: failed to push some refs to '/var/git/myrepo.git'

Critic version: 28ed20bb8032d7cc5aa23de98da51e619fd84164
Critic bug reports can be filed here: https://github.com/jensl/critic/issues/new

After such an error the branch tracking is automatically disabled in the review, then I enable it manually, and it works again, so it's not a critical issue, but sometimes a boring one, if it happens twice a day. I suspect it is due to cyrillic characters in commit messages, since (AFAIK) it hasn't ever been after commits with pure-latin content but I cannot say it for sure - I tried to reproduce it and failed. Unfortunately, I cannot give more information. Maybe, I will try to add some logging and see the text that causes the issue. Our git version is 2.10, critic version is 28ed20bb8032d7cc5aa23de98da51e619fd84164.

joaoe commented 6 years ago

Looking at https://github.com/jensl/critic/blob/stable/1/src/index.py there are a couple print calls that print user names and e-mails. Could it be that you have someone with a non-utf friendly name watching some sub folders and being automatically assigned only when those folders are touched ?

joaoe commented 6 years ago

Using ensure_ascii=False seems to be the way to fix it, so python does not try to re-encode too much.