nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 129 forks source link

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2192' in position 2717: ordinal not in range(256) #1272

Closed tibitoy closed 1 year ago

tibitoy commented 1 year ago

Current Behavior

Downloaded augur via nextstrain for mac. For some reason, I keep getting this error when I try to use it: UnicodeEncodeError: 'latin-1' codec can't encode character '\u2192' in position 2717: ordinal not in range(256)

Is there any way to fix this?

corneliusroemer commented 1 year ago

Hi @tibitoy, thanks for reporting this bug - I'm happy to help! Can you tell me a bit more about what command you run when you encounter that error? Can you paste here what you do in the terminal and all the output, the whole stack trace not just the last line with the error: UnicodeEncodeError: 'latin-1' codec can't encode character '\u2192' in position 2717: ordinal not in range(256)

Also, how exactly did you install augur? Conda, pip, or are you using the Nextstrain CLI?

For some reason the encoding of some file seems to have been messed up. But it's hard to know where and why without extra details.

corneliusroemer commented 1 year ago

I just saw you posted a similar issue at the non-nextstrain augur repo (nothing to do with this tool here):

Tried download augur, a bioinformatic tool. It seems to download perfectly but when I try to run augur --help

I get the following error:

File "/home4/tibitoy/miniconda3/bin/augur", line 8, in sys.exit(main()) 
File "/home4/tibitoy/miniconda3/lib/python3.9/site-packages/augur/main.py", line 10, in main return augur.run( argv[1:] ) 
File "/home4/tibitoy/miniconda3/lib/python3.9/site-packages/augur/init.py", line 64, in run args = make_parser().parse_args(argv) 
File "/home4/tibitoy/miniconda3/lib/python3.9/argparse.py", line 1824, in parse_args args, argv = self.parse_known_args(args, namespace) 
File "/home4/tibitoy/miniconda3/lib/python3.9/argparse.py", line 1857, in parse_known_args namespace, args = self._parse_known_args(args, namespace) 
File "/home4/tibitoy/miniconda3/lib/python3.9/argparse.py", line 2066, in _parse_known_args start_index = consume_optional(start_index) 
File "/home4/tibitoy/miniconda3/lib/python3.9/argparse.py", line 2006, in consume_optional take_action(action, args, option_string) 
File "/home4/tibitoy/miniconda3/lib/python3.9/argparse.py", line 1934, in take_action action(self, namespace, argument_values, option_string) 
File "/home4/tibitoy/miniconda3/lib/python3.9/argparse.py", line 1098, in call parser.print_help() 
File "/home4/tibitoy/miniconda3/lib/python3.9/argparse.py", line 2554, in print_help self._print_message(self.format_help(), file) 
File "/home4/tibitoy/miniconda3/lib/python3.9/argparse.py", line 2560, in _print_message file.write(message) UnicodeEncodeError: 'latin-1' codec can't encode character '\u2192' in position 3174: ordinal not in range(256)

Any possible solutions? For reference, I'm using linux.

https://github.com/chaoss/augur/issues/2475

corneliusroemer commented 1 year ago

It seems like your terminal uses latin-1 codec instead of utf-8, an encoding that doesn't support the "right arrow symbol" and we try to print that to it.

Can you try running the following command first, in your shell, before augur --help. Does that help?

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

followed by

augur --help
tibitoy commented 1 year ago

This fixed it! Thank you so much! I was pulling out hair trying to figure it out. Will delete my thread on the other repo.

On Thu, Aug 3, 2023 at 3:47 PM Cornelius Roemer @.***> wrote:

It seems like your terminal uses latin-1 codec instead of utf-8, an encoding that doesn't support the "right arrow symbol" → and we try to print that to it.

Can you try running the following command first, in your shell, before augur --help. Does that help?

export LC_ALL=en_US.UTF-8export LANG=en_US.UTF-8

followed by

augur --help

— Reply to this email directly, view it on GitHub https://github.com/nextstrain/augur/issues/1272#issuecomment-1664557942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZXUFLCO7HQUTP33F6V53MDXTP54JANCNFSM6AAAAAA3DA437Y . You are receiving this because you were mentioned.Message ID: @.***>

-- Temitope Ibitoye PhD Student, Environmental, Water Resources, and Coastal Engineering North Carolina State University

corneliusroemer commented 1 year ago

It's not your fault, thanks a lot for reporting! We should be able to deal with terminals being set to various encodings.

We introduced that character in this commit: https://github.com/nextstrain/augur/commit/5ba7cf10e3742c46410542c264801070c1ca4432

https://github.com/nextstrain/augur/blob/7766f46c90e88d46a94ca7a6cf9cd2bf4291e2a8/augur/clades.py#L5

We may want to replace it with an ASCII character for better compatibility.

@jameshadfield @tsibley

corneliusroemer commented 1 year ago

I see you also posted to Stackoverflow, just for context in case there are insightful answers: https://stackoverflow.com/questions/76830449/getting-a-unicodeencodeerror-and-not-sure-how-to-fix-it/76831291#76831291

tsibley commented 1 year ago

We should be able to deal with terminals being set to various encodings.

A Unicode encoding is pretty reasonable to expect at this point, but yes, Augur should do better than error. There are several options here, but the best compromise of effort/reward may be to reconfigure stdout/stderr to produce replacement chars (e.g. errors = "replace" or "backslashreplace") when those stdio streams are not a Unicode encoding. Nextstrain CLI does something similar but instead goes a step further and forces UTF-8 for reasons particular to that codebase (which may also apply to much of Augur, not sure without inspecting).

We may want to replace it with an ASCII character for better compatibility.

I don't think we need to stop using Unicode characters.

corneliusroemer commented 1 year ago

Thanks @tsibley, sounds good!

I've figured out how to reproduce the error, though this is probably not news to you:

export PYTHONIOENCODING=latin-1
augur --help

Do you know how the user can change their encoding to work around our bug? Simply export PYTHONIOENCODING=utf-8?

tsibley commented 1 year ago

Setting PYTHONIOENCODING=utf-8 will avoid the error but it will result in mojibake for the affected characters, assuming the terminal is actually expecting latin-1.

tsibley commented 1 year ago

e.g.

image