Closed sgbaird closed 3 years ago
Interesting. Code2flow handles Python files encoded in standard unicode ok so my guess (not sure) is that your file has a non-standard encoding.
Do you have any more details or, better yet, could you share the file?
CrabNet model.py My guess is probably the emoji 😅 E.g. ♻🗑️
I downloaded your file and it actually processes for me. Looking closer at the traceback, it looks like on your machine, the file is encoded in Windows-1252.
I'm not familiar with Windows so don't immediately know how to address this. A simple temporary workaround might be if you can manually convert the file(s) to Unicode but I don't know how you would do this in Windows.
I'm swamped on all sorts of things right now so probably won't be able to investigate further for a week or two.
Thank you! That's perfect. Didn't consider that it might be a Windows-specific issue.
Could you spare me two favors? It would be very helpful to help me figure out what's wrong here:
python3 -c "import locale; print(locale.getpreferredencoding())"
. This command is to determine how files are read by default with your Python.pip3 install charset-normalizer
and run normalizer path\to\crabnetmodel.py
. This command should output a small JSON blurb which will let me know that specific file's encodingpython3 -c "import locale; print(locale.getpreferredencoding())"python3 -c "import locale; print(locale.getpreferredencoding())"
cp1252
normalizer .\CrabNet\crabnet\model.py
{ "path": "C:\\Users\\sterg\\Documents\\GitHub\\sparks-baird\\ElM2D\\CrabNet\\crabnet\\model.py", "encoding": "utf_8", "encoding_aliases": [ "u8", "utf", "utf8", "utf8_ucs2", "utf8_ucs4", "cp65001" ], "alternative_encodings": [], "language": "English", "alphabets": [ "Basic Latin", "Control character", "Emoticons range(Emoji)", "Miscellaneous Symbols", "Miscellaneous Symbols and Pictographs", "Transport and Map Symbols", "Variation Selectors" ], "has_sig_or_bom": false, "chaos": 0.0, "coherence": 100.0, "unicode_path": null, "is_preferred": true }
Note that code2flow crabnet/kingcrab.py
produces out.png
:
normalizer .\CrabNet\crabnet\kingcrab.py
{ "path": "C:\\Users\\sterg\\Documents\\GitHub\\sparks-baird\\ElM2D\\CrabNet\\crabnet\\kingcrab.py", "encoding": "ascii", "encoding_aliases": [ "646", "ansi_x3.4_1968", "ansi_x3_4_1968", "ansi_x3.4_1986", "cp367", "csascii", "ibm367", "iso646_us", "iso_646.irv_1991", "iso_ir_6", "us", "us_ascii" ], "alternative_encodings": [], "language": "English", "alphabets": [ "Basic Latin", "Control character" ], "has_sig_or_bom": false, "chaos": 0.0, "coherence": 100.0, "unicode_path": null, "is_preferred": true }
And finally, one more file from the same repository (train_crabnet.py
):
normalizer .\CrabNet\train_crabnet.py
{ "path": "C:\\Users\\sterg\\Documents\\GitHub\\sparks-baird\\ElM2D\\CrabNet\\train_crabnet.py", "encoding": "utf_8", "encoding_aliases": [ "u8", "utf", "utf8", "utf8_ucs2", "utf8_ucs4", "cp65001" ], "alternative_encodings": [], "language": "Unknown", "alphabets": [ "Basic Latin", "Control character" ], "has_sig_or_bom": false, "chaos": 10.0, "coherence": 0.0, "unicode_path": null, "is_preferred": true }
(https://www.diffchecker.com/)
Looks like it was the emoji. After getting rid of it:
code2flow crabnet/model.py
normalizer .\CrabNet\crabnet\model.py
{ "path": "C:\\Users\\sterg\\Documents\\GitHub\\sparks-baird\\ElM2D\\CrabNet\\crabnet\\model.py", "encoding": "ascii", "encoding_aliases": [ "646", "ansi_x3.4_1968", "ansi_x3_4_1968", "ansi_x3.4_1986", "cp367", "csascii", "ibm367", "iso646_us", "iso_646.irv_1991", "iso_ir_6", "us", "us_ascii" ], "alternative_encodings": [], "language": "English", "alphabets": [ "Basic Latin", "Control character" ], "has_sig_or_bom": false, "chaos": 0.0, "coherence": 100.0, "unicode_path": null, "is_preferred": true }
@sgbaird I think I have a fix. Could you pull it down and verify? https://github.com/scottrogowski/code2flow/pull/31
Addressed in the 2.3.0 release