Closed dalito closed 1 year ago
The problem is that windows does not use utf-8 as default encoding, see PEP 540 or Inada Naokis summary. This is different from mac and linux.
To fix the bug the encoding must be explicitly set to utf8 for writing files in text mode. For example
with open(self.exist_warning(self.dir_path(cls)), 'w') as clsfile:
should be changed to
with open(self.exist_warning(self.dir_path(cls)), 'w', encoding='UTF-8') as clsfile:
This will affect several places in the code. I can prepare a PR tomorrow.
reopened as I think some new tests were incorporated that don't do this
does anyone have any ideas how to check for this with gh actions, otherwise we always run the risk of reintroducing this
what if we pull this file opening into a method? tests would still have to use the method, but then we wouldn't be doing it several times?
I think that makes sense. We mostly go via methods anyway, e.g. when loading or dumping linkml objects. But there are various times in tests where we want to do ad-hoc loading or dumping, often via plain yaml/json libs.
But maybe it's not an issue for the tests so long as all our test files are properly encoded and restricted to ascii unless otherwise required?
@sierra-moxon Where would you put the method, in generator.py, class Generator?
write_to_file
that could be used to avoid repeating the code several times. Let me know if you want me to go through the changes made in #607 and use this new function instead.This issue may be closed or at least re-labeled: It is not a bug anymore. Only the code refactoring to use write_to_file
is left to do. But I feel changing existing correct code is not worth it since the function also adds one level of redirection and therefore complexity.
Describe the bug
When following part 8 of the tutorial on a German Windows-10 PC, the generation of the project failed with a UnicodeEncodeError (see traceback).
To Reproduce Steps to reproduce the behavior:
gen-project -d personinfo/ personinfo.yaml
Traceback
Desktop (if applicable, please complete the following information):
Additional context
The unicode character that makes the problem is an "arrow" in markdowgen.py.
Since a context manager is used to redirect stdout/print-statements to a file, the encoding for stdout plays a role. This is not utf-8 on Windows which causes the error.
The use of the context manager "redirect_stdout" prevent also the use of the pdb-debugger. If I insert
import pdb; pdb.set_trace()
before the error to enter interactive debugging, the code enters debugging mode but I cannot interact with the debugger due to stdout being redirected.If I replace the problematic character, model generation runs fine.