Closed asears closed 2 years ago
The partial error message you posted shows that the last blackbricks-related file involved is blackbricks/files.py, line 31. Seemingly it fails to read the file. Other than that, I can't say much without more details. Please include a minimal reproducing example (i.e. a file which when running blackbricks on it, causes this error).
Oddly enough, it was caused by a pycache folder being in the folder. Running blackbricks on individual files works. Deleting this folder solved the issue, though it has to be deleted in each subfolder it's located in. Probably some error handling around the readfile or increased verbosity to print out the file in question could help, or ignoring pycache folder. Not sure why it is acting on this folder. I'm running on Windows in Powershell.
Aha, that explains it. Like black
, blackbricks
will add files in folders recursively, meaning blackbricks .
should try to format everything in the current folder. I guess what black
does that blackbricks
doesn't, is filtering out files that can't be parsed, such as binary files. It will already skip any files that don't have the Databricks header in it, but I did not consider binary files. It currently tries to read the files as text in order to check for the header line, but of course, that fails for binary files. The __pychace__
folders contain .pyc
files, which are binary, hence the bug you see.
I'll see if I can't add a fix in not too long, should be a matter of filtering out Python files instead of adding any file. If you want I'm happy to review a PR :)
Investigating the PR, I see blackbricks uses os rather than pathlib. I've added a PR to cover the files.py content before doing any changes. Any preference on using os rather than pathlib? I think upgrading to pathlib might help with reconciling to black's code and improve compatibility.
Based on https://github.com/github/gitignore/blob/main/Python.gitignore there are a bunch of files to ignore.
So far I've only encountered pycache folder being an issue, so perhaps excluding these makes sense:
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
An option to handle and skip the corrupt folders/file(s) might help with further inclusions, exclusions, and these could be exposed as some parameter.
Black does this "out of the box" so perhaps there's some means of removing the blackbricks code and implementing the required black api, or re-implementing a portion of what black does.
https://black.readthedocs.io/en/stable/usage_and_configuration/file_collection_and_discovery.html
Here's some relevant files searching for .gitignore https://github.com/psf/black/blob/521d1b8129c2d83b4ab49270fe7473802259c2a2/src/black/files.py https://github.com/psf/black/blob/cae7ae3a4d32dc51e0752d4a4e885a7792a0286d/src/black/__init__.py https://github.com/psf/black/blob/fb9fe6b565ce8a9beeebb51c23f384d1865d0ee8/tests/test_black.py
Just looking at the code, should be a few lines to change in resolve_filepaths to extend parameter(s) to ignore files and folders. A few possible implementations here...
This will be fixed in #25. Nothing complex, just simply skipping files that can't be read as Unicode text. There are many more complicated ways to do this, but in order to keep this project small and manageable, I think this enables 99% of use cases.
When running blackbricks, version 0.6.7 on a file that cannot be parsed, instead of handling the error I am getting this error.
Is there some verbose command that can be used to troubleshoot which file is causing the issue in the repo? Can this error be handled safely as part of returning the reader?