intgr / topy

Python script to fix typos in text, based on the RegExTypoFix project from Wikipedia and AutoWikiBrowser
Other
35 stars 11 forks source link

UnicodeDecodeError when running in a directory with a non-ascii filename #14

Closed alex closed 9 years ago

alex commented 9 years ago
(topy) /t/x $ tree
.
└── �\230\203.md

0 directories, 1 file
(topy) /t/x $ topy -a .
/Users/alex_gaynor/.virtualenvs/topy/lib/python2.7/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

  markup_type=markup_type))
Loaded 3619 rules (except 0 errors, 45 disabled)
Traceback (most recent call last):
  File "/Users/alex_gaynor/.virtualenvs/topy/bin/topy", line 11, in <module>
    sys.exit(main())
  File "/Users/alex_gaynor/.virtualenvs/topy/lib/python2.7/site-packages/topy/topy.py", line 202, in main
    for filename in flatten_files(paths):
  File "/Users/alex_gaynor/.virtualenvs/topy/lib/python2.7/site-packages/topy/topy.py", line 158, in flatten_files
    for filename in walk_dir_tree(path):
  File "/Users/alex_gaynor/.virtualenvs/topy/lib/python2.7/site-packages/topy/topy.py", line 148, in walk_dir_tree
    if not f.startswith("."):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
intgr commented 9 years ago

Fixed this (and a few other Unicode issues) in the new 0.2.0 release.

PS: Keep up the good work on PyPy and Django :) PPS: Unicode is a serious PITA in Python 2