src-d / hercules

Gaining advanced insights from Git repository history.
Other
2.06k stars 146 forks source link

UTF-8 characters in log #377

Closed kown7 closed 3 years ago

kown7 commented 3 years ago

When a name has an umlaut it will fail with

Traceback (most recent call last):
  File "/usr/local/bin/labours", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/labours/cli.py", line 449, in main
    modes[mode]()
  File "/usr/local/lib/python3.6/dist-packages/labours/cli.py", line 304, in couples_people
    *train_embeddings(*reader.get_people_coocc(), tmpdir=args.tmpdir),
  File "/usr/local/lib/python3.6/dist-packages/labours/embeddings.py", line 52, in train_embeddings
    out.write(vocabulary)
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 145: ordinal not in range(128)
vmarkovtsev commented 3 years ago

Hi @kown7 PTAL at the docs: https://github.com/src-d/hercules#bad-unicode-errors Feel free to ping if it does not help.

kown7 commented 3 years ago

Thanks for the pointer, missed that completely. RTFM and so.

Now I'm stuck at

Running: couples-shotness
Traceback (most recent call last):
  File "/usr/local/bin/labours", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/labours/cli.py", line 449, in main
    modes[mode]()
  File "/usr/local/lib/python3.6/dist-packages/labours/cli.py", line 315, in couples_shotness
    *train_embeddings(*reader.get_shotness_coocc(), tmpdir=args.tmpdir),
  File "/usr/local/lib/python3.6/dist-packages/labours/readers.py", line 145, in get_shotness_coocc
    index = ["%s:%s" % (i["file"], i["name"]) for i in shotness]
TypeError: 'NoneType' object is not iterable

This is certainly unrelated. Remove the --shotness to get

Traceback (most recent call last):
  File "/usr/local/bin/labours", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/labours/cli.py", line 449, in main
    modes[mode]()
  File "/usr/local/lib/python3.6/dist-packages/labours/cli.py", line 363, in devs_efforts
    max_people=args.max_people,
  File "/usr/local/lib/python3.6/dist-packages/labours/modes/devs.py", line 305, in show_devs_efforts
    for tick in pyplot.gca().yaxis.iter_ticks():
AttributeError: 'YAxis' object has no attribute 'iter_ticks'

For labours -m sentiment -f pb I get

Traceback (most recent call last):
  File "/usr/local/bin/labours", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/labours/cli.py", line 154, in main
    reader = read_input(args)
  File "/usr/local/lib/python3.6/dist-packages/labours/readers.py", line 439, in read_input
    reader.read(ins)
  File "/usr/local/lib/python3.6/dist-packages/labours/readers.py", line 231, in read
    self.data.ParseFromString(all_bytes)
google.protobuf.message.DecodeError: Error parsing message