intgr / topy

Python script to fix typos in text, based on the RegExTypoFix project from Wikipedia and AutoWikiBrowser
Other
35 stars 11 forks source link

[merged] Allow directories in arguments #9

Closed theanalyst closed 10 years ago

theanalyst commented 10 years ago

This PR adds the capability of arguments to be directories as well. So topy will take a list of filenames and or directory names as arguments now. When directories are encountered, we walk through them, except for hidden dirs in *nix like systems.

Also when apply_to_file is called we check that the file is not a binary file.

Fixes #3.

intgr commented 10 years ago

Sorry I've procrastinated on reviewing this PR for so long. There are some issues:

  1. While it's useful to split your work into multiple commits, the way you've done it here isn't helping: adding a function in a commit that has no callers isn't a complete unit of work. For example, it would make more sense to split it into 2 commits, one adding the recursion functionality and the other adding the binary check (but see below).
  2. I don't like adding dependencies like binaryornot if they can be avoided. I actually had a plan to address this by checking for 0 bytes and handling text as Unicode. If it's binary, it probably fails UTF-8 decode.
  3. Coding style: I'm mostly following http://legacy.python.org/dev/peps/pep-0008/ here (except line length limits); there should be 2 blank lines between top-level functions and a space after each comma. Your docstring style is inconsistent wrt whitespace and they need updating.
  4. Previously the behavior for incorrect path arguments was to print an error and continue. Now it quits with unhandled exception.

For now, keep it simple and don't bother with binary detection.

intgr commented 10 years ago

Skipping of non-UTF-8 files is implemented in commit c4b34e72b55df277c8ecf0ff8181b8f7928f82aa

theanalyst commented 10 years ago

On 15 Aug 2014 03:27, "Marti" notifications@github.com wrote:

Sorry I've procrastinated on reviewing this PR for so long. There are some issues:

While it's useful to split your work into multiple commits, the way you've done it here isn't helping: adding a function in a commit that has no callers isn't a complete unit of work. For example, it would make more sense to split it into 2 commits, one adding the recursion functionality and the other adding the binary check (but see below)

I'll squash the commits before it is merge-ready.. It is easier to squash commits after review than split them :)

I don't like adding dependencies like binaryornot if they can be avoided. I actually had a plan to address this by checking for 0 bytes and handling text as Unicode. If it's binary, it probably fails UTF-8 decode.

I didn't know of the utf8 decode.. I had run in to issues with images getting modified while running into directories before.. I'll check this

Coding style: I'm mostly following http://legacy.python.org/dev/peps/pep-0008/ here (except line length limits); there should be 2 blank lines between top-level functions and a space after each comma. Your docstring style is inconsistent wrt whitespace and they need updating. Previously the behavior for incorrect path arguments was to print an error and continue. Now it quits with unhandled exception.

I'll try handling the exception and use the already used logging module for now

For now, keep it simple and don't bother with binary detection.

— Reply to this email directly or view it on GitHub.

Thanks for the review.. I'll try to address these soon

EDIT: Format fix.. replying from mail had messed up quoting

theanalyst commented 10 years ago

Hi I've rebased against master and removed the dependancies..let me know what you think

intgr commented 10 years ago

Merged. I took the liberty to amend your commit to reorder functions, to move the OptionParser back near to main().

I also committed f8d7c374cc88c95b26ead8a2ad3ea236be30c333 to tweak variable names, docstrings and add some comments. One docstring still talked about excluding "binary files". (I hinted this in in my review point 3, but I wasn't very clear)

If you have comments about my edits, I'd gladly hear them. I'm still new to this role of accepting pull requests, feedback is appreciated.

theanalyst commented 10 years ago

No problems.. I think with the correct manifest and setup.py we have enough for a initial release ..we can maybe upload this to pypi soon enough