Open NataliaBondarenko opened 4 years ago
The group by type is interesting. Doing tables without external dependencies can be a[n unnecessary] challenge. I wouldn't mind having an optional pure-python dependency, if it helps.
We already have a table.
Here are two options for displaying data:
from collections import Counter
# create types map, common types
types2 = {
'jpg': 'images',
'png': 'images',
'mp4': 'videos',
'gz': 'archives',
'zip': 'archives',
'py': 'python files',
'pyc': 'python files',
'pyw': 'python files',
'woff': 'fonts',
'bat': 'executables',
'txt': 'documents',
'md': 'documents',
'xml': 'documents',
'rst': 'documents',
'json': 'documents',
'html': 'documents',
'css': 'documents',
'xlsx': 'documents'
}
# input data
c = Counter(
{'PYC': 38, 'TXT': 27, 'PY': 23, 'MD': 15,
'[no extension]': 9, 'PNG': 8, 'XML': 5,
'RST': 4, 'GZ': 3, 'IML': 1, 'BAT': 1,
'HTML': 1, 'JSON': 1, 'CSS': 1,
'WOFF': 1, 'DAT': 1, 'XLSX': 1}
)
# result template
result = {
'archives': [],
'documents': [],
'executables': [],
'fonts': [],
'images': [],
'python files': [],
'videos': [],
'other': []
}
# update result template with (extension, freq)
for k, v in c.items():
item_type = types2.get(k.lower(), 'other')
result[item_type].append((k, v))
#import pprint
#pprint.pprint(result, indent=2)
#print()
print('------------ COMPACT ------------')
for k, v in result.items():
if v:
total_occurrences = sum([x[1] for x in v])
print()
print(f'{k.upper()}({total_occurrences})')
for i in v:
print(f'{i[0]}: {i[1]}')
else:
pass
print(f'\nFound {sum(c.values())} file(s).\n')
print('------------- TABLES ------------\n')
from typing import List
from textwrap import wrap
from count_files.settings import TERM_WIDTH, DEFAULT_EXTENSION_COL_WIDTH, DEFAULT_FREQ_COL_WIDTH, MAX_TABLE_WIDTH
def show_2columns(data: List[tuple],
max_word_width: int, total_occurrences: int,
term_width: int = TERM_WIDTH, title: str = None):
"""Displays a sorted table with file extensions.
:param data: list with tuples
default in uppercase: [('TXT', 24), ('PY', 17), ('PYC', 13), ...]
with --case-sensitive as is: [('txt', 23), ('py', 17), ('pyc', 13), ...]
:param max_word_width: the longest extension name
:param total_occurrences: total number of files found
:param term_width: the size of the terminal window
:return: the processed data as text to the screen.
"""
if not data:
print("Oops! We have no data to show...\n")
return
max_word_width = max(DEFAULT_EXTENSION_COL_WIDTH, max_word_width)
freq_col_width = max(DEFAULT_FREQ_COL_WIDTH, len(str(total_occurrences)))
ext_col_width = min((term_width - freq_col_width - 5),
max_word_width,
MAX_TABLE_WIDTH)
header = f" {'EXTENSION'.ljust(ext_col_width)} | {'FREQ.'.ljust(freq_col_width)} "
sep_left = (ext_col_width + 2) * '-'
sep_center = "+"
sep_right = (freq_col_width + 2) * '-'
sep = sep_left + sep_center + sep_right
if title: # start of type table
print(f'{title.upper()}({total_occurrences})'.ljust(ext_col_width))
print(sep)
print(header)
print(sep)
for word, freq in data:
if len(word) <= ext_col_width:
print(f" {word.ljust(ext_col_width)} | {str(freq).rjust(freq_col_width)} ")
else:
head = f" {word[0: ext_col_width]} | {str(freq).rjust(freq_col_width)}"
word_tail = wrap(word[ext_col_width:],
width=ext_col_width,
initial_indent=' ' * 2,
subsequent_indent=' ' * 2)
print(head)
for line in word_tail:
print(f" {line.ljust(ext_col_width)} | {' '.rjust(freq_col_width)}")
if title: # end of type table
print(sep + "\n")
return
print(sep)
line = f" {'TOTAL:'.ljust(ext_col_width)} | {str(total_occurrences).rjust(freq_col_width)} "
print(line)
print(sep + "\n")
return
max_word_width = max(map(len, c.keys())) # same for all tables(ext column width)
for k, v in result.items():
if v:
total_occurrences = sum([x[1] for x in v]) # for each table(freq column width)
show_2columns(data=v, max_word_width=max_word_width, total_occurrences=total_occurrences, term_width=TERM_WIDTH, title=k)
else:
pass
print(f'Found {sum(c.values())} file(s).')
The first one looks a bit cleaner, IMHO. I would add a 2 or 4 spaces indentation level for extensions. It needs to use text justification for better alignment.
Yes, the first list is easier to read. I can try text justification, textwrap, indentation, and see what looks better.
Hello! I made a prototype of sorting extensions by type.
This option allows users to create, rename, update, and delete custom extension lists for sorting by type. It can be extensions related to each other (python files) or any extensions.
Lists are stored in a file, and the utility provides only an interface for sorting.
The --sort-type
argument accepts an arbitrary list of types.
The reserved default
value creates or restores a configuration file with default values, description and examples.
count-files --sort-type default
I made some predefined lists: archives, audio, videos, images, python files, ...
However, they can also be renamed, changed, or deleted.
Could you take a look at this?
If you have any suggestions or ideas, please write.
I also found that the table is not displayed if the terminal width is too small and the frequency is too long. In textwrap
wrap(width=n)
does not accept negative values.
ext_col_width = min((term_width - freq_col_width - 5), <- here negative value
max_word_width,
MAX_TABLE_WIDTH)
If devices with a small screen can have many files, we must deal with this.
total_occurrences = 10000000000000000000000000000
:D
Hi!
I made a prototype of sorting extensions by type.
Maybe we should call this feature "group by type" (-g
/--group-by-type
). "Grouping" is more specific and more explicit of what we are trying to accomplish here, in my opinion, since we already have a "sorting" option that is related to alphabetic order.
This option allows users to create, rename, update, and delete custom extension lists for sorting by type. It can be extensions related to each other (python files) or any extensions. Lists are stored in a file, and the utility provides only an interface for sorting.
This is a good idea, since it allows some degree of personalisation that may make sense in a number of use cases. However, in a utility like this I would rather provide a default implementation that does not rely on the creation of an additional file. By using -g
/--group-by-type
, the user would get the default grouping (we must ponder what should be the best default categories, before releasing this feature, I will think more about it later). That default configuration file should live in the application folder itself, in order to avoid unnecessary file system pollution.
Then, the more advanced user, the one who would want to use a custom configuration, could simply use another CLI argument (e.g.: -cfg
/--config-file
) to specify a customised configuration file. I see two possible scenarios here:
or:
I believe option number 1 (global configuration file) is the best one for the scope of this application. Config Path would help in order to ensure that files are saved in the correct location for each platform, but I did not test it yet. Also, in this application I would prefer to stick to the original requirement of keeping external dependencies to the minimum, preferably just pure-python and optional packages. Features depending on third-party packages should be treated as "extras" and be able to display a nice and brief informative message when the required package is not installed.
The use of a user-level customised configuration file (`-cfg`/`--config-file`)
requires a python packaged that currently is not installed in your system. You
can easily install it by using the following command (replace `python3.8` with
the intended Python interpreter):
python3.8 -m pip install config-path
The --sort-type argument accepts an arbitrary list of types. The reserved default value creates or restores a configuration file with default values, description and examples. count-files --sort-type default I made some predefined lists: archives, audio, videos, images, python files, ... However, they can also be renamed, changed, or deleted.
I am not sure that accepting a list of types is the best behaviour. Following the CLI options I have described above, I would say that on the first time the application is run with the -cfg
/--config-file
option the application tell the user something like this:
You haven't created a customised configuration file yet. In case you want to
use the default configuration, please use the command `-g`/`--group-by-type`
instead. Do you want to create the configuration file now, which you can later
customise using a text editor)? [Y/n]
If the user presses Enter or enters Y
/y
, the file is created:
A custom configuration file has been created. To make any changes, just edit
the file /home/username/xxxxxxxxxx/count-files.ini in your text editor.
The correct path should be displayed, of course.
What do you think?
Hi! There are a few thoughts.
Naming
Yes, it can be called "Grouping."
The name -type/--sort-type
was chosen because it is in the same style as -alpha/--sort-alpha
.
The name of the argument may be shorter.
-g/--group-by-type
-g/--groups
-g/--group
Default implementation that does not rely on the creation of an additional file I agree, the default configuration can be permanent. In this case, we can just make a dictionary with the necessary groups and extensions in the source code. In any case, the user can choose to use the default settings or not.
The best default categories A few short lists with well-known file extensions. Several lists of development-specific extensions because the intended audience is developers.
One global configuration file that lives in the corresponding user settings
In the prototype, the configuration file is created in the same folder as the settings.py
I thought to choose another folder where it will be stored later.
For this file, we can create a folder in the user's home directory.
os.environ['HOME']/count_files/1.6.0/count_files.ini
This will allow us to save user preferences when updating or reinstalling the package.
We can use one file in which the user can specify settings for a specific path.
import configparser
config = configparser.ConfigParser(interpolation=configparser.ExtendedInterpolation())
# one global configuration file
# any list/lists in [DEFAULT]
# you can use it all at once
# without sections
simple_config = """
[DEFAULT]
audio = mp3, mid, midi
videos = avi, flv
images = bmp, png, svg
misc = other, ext
"""
# or with sections
advanced_config = """
[DEFAULT]
audio = mp3, mid, midi
videos = avi, flv
images = bmp, png, svg
misc = other, ext
[home]
path = /home/username
extensions = py, pyc, pyw, ${DEFAULT:audio}
[folder]
path = /home/username/folder
extensions = txt, db, md, jpg
"""
args_path = '/home/username'
config.read_string(advanced_config)
res = {}
# search for selected extensions in Counter
# one header, e.g. 'selected'
sections = config.sections()
if sections:
for i in config.sections():
if config[i]['path'] == args_path:
res.update({'selected': config.get(i, 'extensions')})
break
# search for selected extension groups in Counter
# keys are needed for headings
if not res:
for k, v in config['DEFAULT'].items():
res.update({k: v})
print(res)
Possible solution
A custom configuration file can be created upon request.
For example:
count-files -g create
creates a basic template with examples and description.
If for some reason we cannot create the file, we can keep the example of count_files.ini
in the repository on GitHub.
count-files -g read
reads and displays the default dictionary from source code and custom configuration file (if exists).
count-files -g
if a custom configuration file exists:
use extensions/groups from this file via ConfigParser module
else:
use dictionary from source code with default groups
In addition, we can add a value that allows to use the default groups and user groups together.
The purpose of using the custom config file:
count-files -g
- only one flag needs to be entered
If the user file is missing, the default dictionary for this version is used.External dependencies
The configparser and os modules are sufficient for this option.
I do not really want to use third-party libraries.
Have you heard about the left-pad
(js library) incident?
If the user presses Enter
Could it be better to use explicit confirmation of the action?
Everything except answer.lower() in ['y', 'yes']
counts as No.
I am not sure that accepting a list of types is the best behaviour.
Yes, it’s better to type less text.
I would say that on the first time the application is run with the -cfg/--config-file option the application tell the user something like this:
Is it possible to ask this question once (technically) and without adding another argument?
Edit:
-g and -cfg
Also, what are the other advantages of two separate arguments for sorting?
-g/--group
-> I like this option.
For this file, we can create a folder in the user's home directory.
os.environ['HOME']/count_files/1.6.0/count_files.ini
This will allow us to save user preferences when updating or reinstalling the package.
I am not sure about it, since it may not follow the platform convention on where configuration files should be placed. If a user has 100 applications and each one adds a configuration folder at ~/
, it gets kind of polluted. The Config-path package I mentioned above was an attempt to create a cross platform abstraction that would make easier to respect each platform's convention. Reading through its description, it seems that on Windows it's not just a matter of specifying a path, since the correct path must be obtained from a Windows API call. I agree that in this application we should try to stay away from third party packages (yes, I have read about that left-pad
incident, and it's precisely what we should try to prevent). Currently, I am not able to dig into this subject as much as it deserves, but I will be reviewing eventual PRs as usual, as soon as I manage to do so.
-s
yet, so I would suggest this:-g
/ --group
-> use the default group, loaded from our dictionary in the source code.
-s
/ --settings
-> use the users' settings file, if it exists at PATH. If it doesn't exist, ask the user if a new settings file, containing some examples/instructions, should be created at that location.
So, two separate arguments, one for default grouping, and the other for loading user settings from file. These are two distinct operations, and using separate arguments allows for future expansion on user settings. For instance, the settings file could include some configurations for tables or for other default parameters.
Could it be better to use explicit confirmation of the action? Everything except answer.lower() in ['y', 'yes'] counts as No.
[Y/n] -> My idea was a simple y
/Y
or N
/n
for yes
or no
, but we may also accept yes
/no
. The upper case Y
implies that it is the default option (the one that is assumed when the user just presses ENTER
). We can opt to make No
the default option, though, or even add some more explanatory text that clearly states the happens if the user presses Enter
with no answer. Maybe No
would be a better default option indeed.
User settings and Platform convention on where configuration files should be placed
-s/--settings
The future expansion of user preferences is an interesting option. The settings file may also contain user-defined supported types.
On Windows, os.environ['HOME']
seems to be the usual place for such files. I have settings, logs, cache from different applications there. The file location can be selected depending on which OS the program is installed on.
We can think about it later. Сustom settings may be done in the next version.
Now I want to start doing at least something with a file preview.
Seems ok to me.
Sort extensions in Counter by type, for example: image - jpg, png, gif, bmp... video - mp4, avi, flv, 3gp... audio - mp3, wav, ogg... archive - zip, tar, rar, gz... and so on
This can be either a table with sections, or for each type separately.