victordomingos / Count-files

A CLI utility written in Python to help you count files, grouped by extension, in a directory. By default, it will count files recursively in current working directory and all of its subdirectories, and will display a table showing the frequency for each file extension (e.g.: .txt, .py, .html, .css) and the total number of files found.
https://no-title.victordomingos.com/projects/count-files/
MIT License
23 stars 9 forks source link

Add some new preview systems for common file types: #91

Open victordomingos opened 6 years ago

victordomingos commented 6 years ago

In the standard library, there are tools for working with common file types. Text files can be read. and what kind of data can be used for the preview images or PDF?

Originally posted by @NataliaBondarenko in https://github.com/victordomingos/Count-files/issues/84#issuecomment-421595666

victordomingos commented 6 years ago

The Standard Library has indeed a few tools that we may consider to use for this purpose without having to dive into lower level APIs or third-party packages:

Databases:

Text based data files:

Compressed archives:

Audio:

Images:

Maybe this one can be useful also:

For other file types, like PDF, XLSX, DOCX, and images, we can use a third-party package like PIL and others. In my opinion, these should not be required as runtime dependencies: the application should run with or without them but, in case they're present, it should try to use them to present more meaningful preview info.

Or, in some cases, we may try to write custom functions to read and interpret the binary data, in the cases the file header is public and well documented, but I think in tends to be a more troublesome road...

NataliaBondarenko commented 5 years ago

There is one crazy idea how to preview the contents of a media file and avoid additional dependencies. Some formats are supported by browsers. Open some path:

preview

Actually in this case the html-document is formed. And by clicking on the link, you can see the contents of the image and some text formats.

preview_png

preview_text

NataliaBondarenko commented 5 years ago

You can generate a report with sorted local links to pictures as a more compact list in the HTML document. In the terminal it can not be seen.

victordomingos commented 5 years ago

Count-files is primarily a text mode, console based, application. The main functionality should be made available through the console itself. But it is an interesting feature. I think it can be added as a new option (--use-browser, or something similar) as a more detailed preview mode, but there should be a text-mode version first for each format, even if the information displayed is very basic.

NataliaBondarenko commented 4 years ago

Preview for binary: We can use the file signature for a preview of the binary files. show the signature and/or detect known file types:

>>> with open('path/to/file.png', 'rb') as f:
    bytes(f.read(20))

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x06@'
>>> bytes.fromhex("89 50 4E 47 0D 0A 1A 0A") in _
True
>>> 

What do you think about this?

victordomingos commented 4 years ago

It's a good idea. 👍