Closed livmion closed 3 years ago
This depends on what the user is actually interested in.
Exporting to simple text format and to CSV is easy; but export exactly what? A tree view like in QDirStat contains a ton of information, but it becomes accessible only with most of it hidden; i.e. most branches of that tree are collapsed. If everything would be expanded at once, the list would become looong, producing a gazillion lines in a report file; or dozens of pages in a PDF.
It's all very easy as long as all tree branches are neatly collapsed:
But if all the branches are expanded so you can do any processing on the data with other tools, it becomes unwieldy to the point of being pretty unusable - look at the scroll bars to get an impression how long the whole thing becomes:
...and this is only a pretty small tree with only 712 files and directories in total. Imagine what it looks like for nontrivial directory trees.
Even if only one tree level more than just the toplevel is expanded for all branches, it's already barely usable:
It becomes usable only when manually opening only those branches that you are currently interested in:
And this is where the power of a tree view lies: In hiding most of the information because it's typically irrelevant to the task at hand. Those collapsible and expandable trees are useful because they allow you to selectively expand and collapse individual branches, not dumping just everything on you.
But when exporting the tree to a file or to a printable document like a PDF, there is little choice other than just exporting everything, i.e. expanding all branches and hand over the responsibility to the user (or to the next software the user chooses to use).
Having said that, there already is an exporter tool: The qdirstat-cache-writer
.
With the -l
(long format) command line option, the result is really easy to parse because every line contains the complete path:
qdirstat-cache-writer -l ~/src/qdirstat /tmp/src-qdirstat.txt
head -n 30 /tmp/src-qdirstat.txt
[qdirstat 1.0 cache file]
# Generated by qdirstat-cache-writer
# Do not edit!
#
# Type path size mtime <optional fields>
D /work/home/sh/src/qdirstat 4096 0x615376c7
# Device: /dev/nvme0n1p5
F /work/home/sh/src/qdirstat/.qmake.stash 739 0x612abb77
F /work/home/sh/src/qdirstat/.gitignore 22 0x5fb9043d
F /work/home/sh/src/qdirstat/LICENSE 18092 0x6151907d
F /work/home/sh/src/qdirstat/qdirstat.pro.user 20401 0x6151907d
F /work/home/sh/src/qdirstat/qdirstat.pro 832 0x6151907d
F /work/home/sh/src/qdirstat/README.md 44777 0x615376c7
F /work/home/sh/src/qdirstat/Makefile 35198 0x61519093
D /work/home/sh/src/qdirstat/src 12288 0x615376da
F /work/home/sh/src/qdirstat/src/OpenDirDialog.cpp 9284 0x61209c5b
F /work/home/sh/src/qdirstat/src/ui_mime-category-config-page.h 12390 0x615376d1
F /work/home/sh/src/qdirstat/src/HeaderTweaker.h 4988 0x5fb9043d
F /work/home/sh/src/qdirstat/src/SizeColDelegate.cpp 6789 0x61004835
F /work/home/sh/src/qdirstat/src/SystemFileChecker.h 1380 0x5fb9043d
F /work/home/sh/src/qdirstat/src/ui_filesystems-window.h 5155 0x612abb7a
F /work/home/sh/src/qdirstat/src/PkgQuery.cpp 4632 0x5fb9043d
F /work/home/sh/src/qdirstat/src/UnreadableDirsWindow.cpp 6967 0x610fa3d0
F /work/home/sh/src/qdirstat/src/file-type-stats-window.ui 3007 0x5fb9043d
F /work/home/sh/src/qdirstat/src/PercentileStats.cpp 4394 0x5fb9043d
F /work/home/sh/src/qdirstat/src/History.h 5064 0x6103d78e
F /work/home/sh/src/qdirstat/src/Cleanup.h 14285 0x6151907d
The file format is speficied here:
https://github.com/shundhammer/qdirstat/blob/master/doc/cache-file-format.txt
Notice that this contains only the plain data, no aggregated / accumulated values; i.e. no sums per directory branch, no oldest / newest timestamp per branch etc.
As for exporting to a pixel image format, the pedestrian way is to create a screenshot of what you are currently seeing; all Linux desktops support that in some ways, either screenshotting the whole screen or just the current window, depending on key combination. It's usually the PrintScreen
key with or without any of the Alt
, Ctrl
etc. modifier keys.
If a user only wants the treemap (the colored rectangles in the bottom part), there is also a simple solution: Just drag the divider separating the tree view from the treemap all the way up, and the tree view disappears; and then hit PrintScreen
:
(Hitting F9
and F9
again restores the usual layout with both views visible again)
This is also the best possible resolution for the treemap since this rendering is strictly pixel-based; exporting to a vector image format like SVG would not improve it in any way.
So, what is the use case?
Wishing for CSV implies further processing in a spreadsheet like MS Excel or LibreOffice Calc. But spreadsheets are matrix-oriented; they fail miserably with everything hierarchical, i.e. tree-based. You'd have to do special tricks that always make assumptions about the deepest nesting level of a tree, and that invariably results in kludges; this approach is very limiting.
What kind of processing are you thinking of? Please name some concrete examples so we have a basis for discussion.
Hi Stefan,
first of all thank you for your answer.
Unfortunately I am not in charge and I cannot choose whether something makes sense or not. I have to just follow the rules and in our data management protocol we need a very long list with all files in CSV or PDF format, before we are going to archive them in a long term repository. I think anyway it is a good idea to have an emergency list when something goes wrong.
The list actually looks like this in CSV:
TreeSize Professional Bericht, 01.10.2020 15:05
H:\ auf [PS-05319]
Laufwerk: H:\ Größe: 1,8 TB Belegt: 251,9 GB Frei: 1,6 TB
Name;Absoluter Pfad;Größe;Belegt;Dateien;Verzeichnisse;Prozent (Belegt);Letzte Änderung;Letzter Zugriff;Besitzer;Typ;Berechtigungen;Geerbte Berechtigungen;Eigene Berechtigungen;Autor;SHA256-Prüfsumme
"2020_10";"H:\2020_10\";180,9 GB;220,2 GB;91.503;1.876;87,4 %;30.09.2020;30.09.2020;"Jeder";"Ordner";"Jeder: Vollzugriff";"Jeder: Vollzugriff";"Jeder: Vollzugriff";"";""
"DFG_Gelehrtenbriefe";"H:\2020_10\DFG_Gelehrtenbriefe\";66,2 GB;67,9 GB;5.192;704;30,8 %;30.09.2020;30.09.2020;"Jeder";"Ordner";"Jeder: Vollzugriff";"Jeder: Vollzugriff";"Jeder: Vollzugriff";"";""
"LOTTO_28";"H:\2020_10\DFG_Gelehrtenbriefe\LOTTO_28\";45,3 GB;46,4 GB;3.472;449;68,4 %;30.09.2020;30.09.2020;"Jeder";"Ordner";"Jeder: Vollzugriff";"Jeder: Vollzugriff";"Jeder: Vollzugriff";"";""
"TIFF_JPG";"H:\2020_10\DFG_Gelehrtenbriefe\LOTTO_28\TIFF_JPG\";45,3 GB;46,4 GB;3.472;448;100,0 %;30.09.2020;30.09.2020;"Jeder";"Ordner";"Jeder: Vollzugriff";"Jeder: Vollzugriff";"Jeder: Vollzugriff";"";""
"A-II-RomA-BraE-001";"H:\2020_10\DFG_Gelehrtenbriefe\LOTTO_28\TIFF_JPG\A-II-RomA-BraE-001\";614,4 MB;626,5 MB;44;0;1,3 %;23.06.2020;30.09.2020;"Jeder";"Ordner";"Jeder: Vollzugriff";"Jeder: Vollzugriff";"Jeder: Vollzugriff";"";""
Of course is more manageable in Calc. What we really need in it is the file name, file path, dimension, percentage, file type and possibly creation and modification data. Your command looks great and it is perfectly organised, but for a person who does not have any command line knowledge, it is a bit scary. A great solution would be to have a simple menu entry like: export
→ CSV
.
Usually we attach two other files: a PDF file with all aggregated data for file type and format; an histogram where all data are showed in age classes, e.g. 1 year old, 6 months old etc…, and how much space they are using. You can find two examples here attached: TreeSize_Balkanendyagramm_2020_10.pdf; TreeSize_Dateitypen_2020_10.pdf.
I have already seen you have similar data displayed inside the software. It is not so important to have an identical graphical output, but it would important to attach similar data, where information is organised like in the attached files.
Do you think would it be possible to export something like this from your software?
Thank you again for your help,
Best Regards
Alessio Paonessa
Both types of information are available, albeit in a slightly different form:
Of course, those are screenshots; i.e.
they are limited by the screen size; if there is more content than fits on the screen, part of it would be scrolled out of scope.
There is no way to process any of that any further with any scripts or a spreadsheet application like Excel or LibreOffice Calc.
It is obvious that they are screenshots since they contain window manager borders and buttons.
Furthermore, the "other" file types are limited to the top 20; any more that don't belong to one of the configured MIME categories (which you can customize, however) are omitted.
Exactly, that is my problem, if the list is too long I will not get a complete result. Is there any way to work around this limitations, i.e. to have an exportable complete list and graphical outputs of the two statistical analysis?
Thank you again
Best Regards
Alessio
Let me think about it.
If there is a generic solution that is not insanely complicated, and if there is a reasonable way to make it fit nicely into the GUI without adding confusion for the average user, I am all for it. But I want to avoid cluttering each of those view windows with buttons that are rarely needed; each additional button adds to the complexity of the user interface.
Maybe (just maybe) it's time for a hamburger button / menu in those dialogs to put those actions in; that would make it reasonably discoverable, yet not too obtrusive. This might take actions to expand or collapse all (toplevel?) items and an "Export" submenu with options to export as plain text or CSV.
My initial idea is to make it export the view that you are currently seeing, just the same way as items are currently expanded or collapsed. It would use the existing data models and do that in a generic way; so this could be added to any of the existing dialogs that use a tree view.
Caveat: So far, I don't make any promises beyond giving it some serious thought. ;-)
Your proposal seems to me already a great improve, thank you again for thinking about it. These features would allow us to switch to a complete open source system in our archiving pipeline.
Unfortunately I am no programmer, only a digital humanist, and I cannot help you in the hard work. Let me know if I can support you in other ways.
I did some experiments, and I took an in-depth look into the code; and the result is that I have to disappoint you: No, this can't be reasonably done.
QDirStat is a GUI-centric application. While the data are carefully kept separate from the presentation, it is in fact a whole new presentation that you are asking for: In file format, no matter if (well-formatted) plain text or CSV.
QDirStat uses Qt classes for the presentation part; they already have a considerable abstraction layer to keep the logic layers apart. At the core, they use a QAbstractItemModel which is the base class for QDirStat's DirTreeModel that in turn uses a DirTree in-memory representation of the relevant data.
Responsibilities are split between those model classes and Qt's view classes (a QTreeView-derived class in this case), and many things are abstracted so the application doesn't have to deal with all the gory details; such as which columns are visible and which are not, the order of those columns (you can rearrange them interactively), the scroll positions in both dimensions, which tree branches are expanded and which ones are collapsed, the sort order (by which column and ascending vs. descending).
While it is possible with enough trickery to break all those software abstraction layers, doing so is really violating the abstraction levels; and this is asking for trouble because some of those things are officially accessible from the outside (i.e. they are part of a documented API), but some are not (i.e. exploiting undocumented Qt features).
As a result, attempting to replicate all that so an exported file looks very much like what you see on the screen is incredibly hard, and it might easily break between different (even minor!) versions of the Qt libs.
Yet, trying to mimic the on-screen presentation would be the only reasonably useful way to export data: You would get the columns that you see on the screen in the order that you see on the screen, with the tree branches expanded that you see expanded on the screen. The only difference would be that you would no longer be limited to the screen size, so the exported file would contain everything that you could see if you had an infinitely large screen.
The only reasonable alternative would be to simply export everything, always resulting in a huge file.
Also, the export formats would have inherent limitations:
CSV is matrix-oriented, just like a spreadsheet. It does not have any concept of hierarchies a.k.a. tree levels a.k.a. indentation. So this would only ever work as a kludge:
For the file tree, it would simply ignore all tree levels / indentation, and the consumer of that file would have to take are of reconstructing the tree by the complete path name which would be required to be in one of the first columns. Yikes. Try doing that in Excel or LibreOffice Calc! AFAICS this is next to impossible or completely impossible.
Adding empty cells at the left as a placeholder for indentation levels would quickly overwhelm any automated processing in a spreadsheet; that is too much abstraction for applications like Excel or LibreOffice Calc.
Plain text / pretty text could use blanks for indentation; e.g., 2, 3 or 4 blanks per indentation level. That would make the tree at least somewhat recognizable.
Column width would be a problem for any text format (not for CSV). The nice auto-adjusting columns that the tree widget has would have to be replicated; and in case some columns get excessively wide due to very wide content, it would have to do something intelligent. Yikes.
Sorting is handled in large parts by the tree widget; the data models just provide a comparison method for two items and for each column. For an export feature, this would have to be replicated.
So, all things considered, this would result in a lot of custom-written code that would in large parts duplicate functionality that is otherwise done by Qt in the Qt widgets and data model classes; that would pretty much defeat the purpose of using a well-tested and well-maintained library like Qt.
So this feature would result in a lot of very seldom-used code that is also not very well-tested and used only by a very small number of users; a sure recipe for bit rot.
It's also not very aligned with the original purpose of QDirStat and with the vision behind it: A highly interactive tool that gives you up-to-date data so that you can act upon that information immediately; in many cases using the built-in cleanup methods to delete files or directories, to compress directories etc. plus whatever custom cleanup actions users may configure themselves.
QDirStat gives you a snapshot of filesystem information as it was at the moment of reading; but a moment later, that information may already be obsolete because filesystems can (and do!) change all the time. It's all very volatile, just a snapshot in time.
Exporting such a snapshot of information is a very exotic use case. I do acknowledge your specific use case, but please understand that this is a fringe case; only a very small number of users would benefit from such a feature.
Yet, this feature would require considerable code with considerable duplication of functionality, as mentioned above; and since only very few users would ever use it, it would also be code that can be considered pretty dead code (not completely dead, but also not very alive), and any change in the (Qt or system) environment would not be noticed quickly; so it would be poorly maintained code. And that is what bit rot is all about: Code that once worked, but all the time keeps getting more and more hiccups and flaws, up to a point where it is more of a burden than a benefit; for anyone, users as well as developers.
Dead or dying code adds to a software project's technical debt, and knowingly accumulating more technical debt is always a bad idea.
So, sorry, but no, I am convinced that it's not a good idea to add this.
Having said all that, taking such a snapshot in time and loading it again later is exactly what the Write to Cache File and Read Cache File operations in the File menu do (and also the qdirstat-cache-writer script); but using QDirStat with all its features to load it, so all views (File Type Statistics, File Size Statistics, File Age Statistics) are available; not being limited by why a spreadsheet program can do with the data.
Yes, I know, that is little consolation in your case where you have to provide the data in a format that is required by higher authority. Still, I wanted to mention it for others reading this thread.
Dear Stefan,
I would like to know if it could be possible to add a feature to export the statistics displayed in QDirStat to CSV, JPG or PDF file formats.
I am currently working for a research institute and we are looking for a user friendly alternative to TreeSize on Linux. Your application is great and would satisfy all of our needs, but we also need to compile reports with statistics attached. For me it is not a big problem to use
ls
ordu
, but for my colleagues is different and they would appreciate for sure such a solution.Thank you in advance for your attention,
Best Regards
Alessio Paonessa