Open NataliaBondarenko opened 4 years ago
I would rather issue a 1.6 with the current feature set and then make 1.7 the one with the file previews, if it's ok for you. Will still need to see what's missing from documentation, with regard to the latest changes, and make sure the translations are on sync.
Hello!
CLI help is updated with previous PR.
English docs were last updated for search by pattern (--filename-match
argument). This is a priority task.
Other docs are more outdated. But few people will notice it. In terms of traffic, people do not often view these pages.
TODO:
add tags to generate documentation on the Read the Docs Tags are needed in this repository on commits https://github.com/victordomingos/Count-files/pull/86/commits/f9559b6e0969ca43320f0773e2bf306c58a5e85e (1.4.0) and https://github.com/victordomingos/Count-files/pull/107/commits/e94d15229a4761ba1795446e2cbe282c3c73914c (1.5.0) Could you add tags for the corresponding versions?
conduct tests for this version and update the list of tested operating systems
Also:
Previewing text files is an old issue. Features that were not even planned were added to this version. Why postpone the preview solution?
Ok. Let’s improve the preview for text files for 1.6 and let binary formats for later. Special care must be put in choosing which files are binary or text, and proper treatment of any exceptions.
Regarding branches, until now all versions were intended to be compatible backwards, so it made some sense to fix any bugs in the next update within the same branch. Our public releases are published on PyPI, not on GitHub’s development repo. When we decide to switch to v2.x, then yes, we must keep a separate v1.x branch for bug fixes.
I believe I have added tags for all previous releases, could you please check again? I missed one release, so I added a new tag recently. Maybe that’s the one you were referring to?
With regards to tests, I can test on macOS Catalina, iOS/Pythonista, Haiku R1/beta2 and maybe a few virtual machines. The last time I tested on macOS, I got one failing test. I believe it has something to do with the creation of a comparison file, and you have already explained that to me but I confess I can’t remember. I will submit an issue to see if you are able to help, ok?
Finally, keeping documentation in sync across different languages can easily become a mess. I would like to find some sort of technical solution to help keep them synchronized, but not sure what the best solution is. I know there are some specialized web apps, like Pootle which I have used for Haiku, but that would require setting up a server and probably some costs. I have heard of GlobalSight and OmegaT, but I haven’t tried any of those yet.
Hello! I have updated the preview for text files. New branch https://github.com/NataliaBondarenko/Count-files/tree/textpreview/count_files This version allows us to extend the preview capabilities without external dependencies.
This version is proposed by me for discussion. This has its pros and cons. What do you think?
def generic_text_preview
Added encoding in open(filepath, mode='r', encoding='utf-8')
.
UTF-8 is one of the most commonly used encodings (w3techs.com stats). UTF-8 has several convenient properties: docs.python.org Unicode HOWTO
Also, this encoding renders text with mixed characters (like Cyrillic and Latin) quite correctly. I tried this with the README files in the repository as well as a Japanese text file.
The previous version of this function was with open (filepath, mode = 'r')
.
Docs: If encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.
This option is left as a fallback for opening files.
First, we try to open a file with encoding='utf-8'. If this fails (UnicodeDecodeError), then we try to open the file with the user's preferred encoding.
shell-command
argument to Search groupThe idea is to use the Unix "file" command, a file type detector (wiki).
Using this program allows the CLI to detect text files with or without an extension and display a preview of those files.
Determining the file type is done with this command through the subprocess module.
In general, it gets the output of $ file /path/to/file.ext
.
Depending on whether this program is available, we can create a preview with different functions.
def generate_preview_with_file
https://github.com/NataliaBondarenko/Count-files/blob/textpreview/count_files/utils/file_preview.py#L91
or
def generate_preview
https://github.com/NataliaBondarenko/Count-files/blob/textpreview/count_files/utils/file_preview.py#L61
In this case, preview is only available for files with certain extensions.
This function can be used for all operating systems.
I have added two functions to check if the Unix "file" command is available and works as expected. https://github.com/NataliaBondarenko/Count-files/blob/textpreview/count_files/utils/file_handlers.py#L78
This utility works with files pretty quickly. The "file" command is a standard program on Unix and Unix-like OS. It is also ported to Windows. This program can be used on Windows. For example, if the user installed it along with Git (https://git-scm.com) and added it to the PATH environment variable. I don't see anything like this for Haiku and StaSh. Thus, the use of this argument is limited to desktop operating systems such as Linux, Mac OS, and Windows.
If this version is appropriate, there will be no more significant changes for v1.6.
TODO:
Ok. Let’s improve the preview for text files for 1.6 and let binary formats for later.
I think we can preview some binaries using the Python standard library.
For example, if you want a list of files with the same extension.
count-files --file-extension ext_name --preview
For all files in a directory --file-extension ..
, choosing the correct function and processing the files can slow down the program.
Special care must be put in choosing which files are binary or text, and proper treatment of any exceptions.
Determining which files are binary or text files is difficult. To increase the likelihood of correctly detecting the file type, we can use OS utilities. I already mentioned the "file" command in the comment above.
Regarding branches, until now all versions were intended to be compatible backwards, so it made some sense to fix any bugs in the next update within the same branch. Our public releases are published on PyPI, not on GitHub’s development repo. When we decide to switch to v2.x, then yes, we must keep a separate v1.x branch for bug fixes.
Ok. It makes sense to me.
I believe I have added tags for all previous releases, could you please check again? I missed one release, so I added a new tag recently. Maybe that’s the one you were referring to?
There were changes in the version documentation after these tags. I made small clarifications to the text of the documentation, not to the code itself later. Existing tags do not cover several pull requests.
With regards to tests, I can test on macOS Catalina, iOS/Pythonista, Haiku R1/beta2 and maybe a few virtual machines.
I have Windows and Linux.
The last time I tested on macOS, I got one failing test. I believe it has something to do with the creation of a comparison file, and you have already explained that to me but I confess I can’t remember. I will submit an issue to see if you are able to help, ok?
Comparison files are generated automatically in the latest tests. It might be an old test file.
Finally, keeping documentation in sync across different languages can easily become a mess. I would like to find some sort of technical solution to help keep them synchronized, but not sure what the best solution is. I know there are some specialized web apps, like Pootle which I have used for Haiku, but that would require setting up a server and probably some costs. I have heard of GlobalSight and OmegaT, but I haven’t tried any of those yet.
I suggest maintaining only English documentation (Read The Docs and README) after v1.6.
Hi! I had a quick look over your new branch and it seems a nice improvement indeed. Thanks.
As usual, documentation must be clear about availability issues and IMO it should also include some guidance on how to get it to work on Windows.
This utility works with files pretty quickly. The "file" command is a standard program on Unix and Unix-like OS. It is also ported to Windows. This program can be used on Windows. For example, if the user installed it along with Git (https://git-scm.com) and added it to the PATH environment variable. I don't see anything like this for Haiku and StaSh. Thus, the use of this argument is limited to desktop operating systems such as Linux, Mac OS, and Windows.
Actually, I believe we can also count with file
availability on Haiku:
iOS/StaSh has no file
binary, so in this case we must make sure that a proper message is given to the user.
With regards to multilingual documentation, I didn't give up on it yet. The English version will be the master, but any changes should be properly identified so that the translators know where to look for. I intend to keep maintaining at least the Portuguese translation (it can be kept in that single markdown file).
As usual, documentation must be clear about availability issues and IMO it should also include some guidance on how to get it to work on Windows.
Actually, I believe we can also count with
file
availability on Haiku:
Currently, command availability checking is limited to specific operating systems (win, linux, darwin). https://github.com/NataliaBondarenko/Count-files/blob/textpreview/count_files/utils/file_handlers.py#L130 This limitation can be removed. We can try using the "file" command on any operating system.
The --shell-command
argument can take either a command name or the path to an executable file.
--shell-command file
or
--shell-command /path/to/file
This can be useful on systems where the "file" command is not standard.
That is, you can install the program and use it without adding it to your PATH environment variable.
With regards to multilingual documentation, I didn't give up on it yet. The English version will be the master, but any changes should be properly identified so that the translators know where to look for. I intend to keep maintaining at least the Portuguese translation (it can be kept in that single markdown file).
A shorter version of the documentation in one markdown file for each language?
Currently, command availability checking is limited to specific operating systems (win, linux, darwin). https://github.com/NataliaBondarenko/Count-files/blob/textpreview/count_files/utils/file_handlers.py#L130 This limitation can be removed. We can try using the "file" command on any operating system.
The
--shell-command
argument can take either a command name or the path to an executable file.
--shell-command file
or--shell-command /path/to/file
This can be useful on systems where the "file" command is not standard. That is, you can install the program and use it without adding it to your PATH environment variable.
This information may be useful, especially the shutil.which(command)
part:
https://stackoverflow.com/questions/11210104/check-if-a-program-exists-from-a-python-script
With regards to multilingual documentation, I didn't give up on it yet. The English version will be the master, but any changes should be properly identified so that the translators know where to look for. I intend to keep maintaining at least the Portuguese translation (it can be kept in that single markdown file).
A shorter version of the documentation in one markdown file for each language?
I am not sure if we can make it much shorter without leaving some features undocumented, but we may consider keeping it in a single file if it helps. At this time, we have that situation in Portuguese (a short Readme and a longer single-file documentation). The simplest workflow (not necessarily the best one though) would be going back to a single file per language, merging back readme and documentation. That would let us with a single documentation file for each language.
Now, the most important IMHO bit is to establish a workflow. For instance, whenever the user interface changes, e.g. a new feature is added/removed or it gets a new behaviour, the developer could also add a new issue indicating the changes that need to be updated in the documentation. If possible, the English version should be updated together with the code pull request itself, so that at least the English documentation is always up to date. The issue tracker would let us keep track of any sections that need to have their translation updated. What do you think?
I propose to completely solve the issue with previewing text files in this version. And we can leave a preview for the binaries until the next version of the package.
Option 1 1) Try to open all files in text mode. 2) Show a preview of the text of the file, if possible. Otherwise, show information or error message.
Option 2 with skipping some known binaries For example, files with known signatures.