windirstat / windirstat

WinDirStat is a disk usage statistics viewer and cleanup tool for Microsoft Windows
https://windirstat.net
GNU General Public License v2.0
790 stars 65 forks source link

Feature Request: Indicate "cloud only" files in OneDrive folders as zero disk space #50

Open GuyLarri opened 7 months ago

GuyLarri commented 7 months ago

Microsoft OneDrive and Dropbox apps allows a folder tree to be synced between the local hard drive (or SSD) and the cloud. It is possible to set files or even large subfolders as "cloud only" to fee up disk space. The apps will auto-download the file if a program attempts to access it. On my computer I have many large files marked as "cloud only" to free up disk space.

As far as I can tell, currently, WinDirStat reports all files as taking up disk space even if they are "cloud only". It would be great if "cloud only" files were marked in some way on the visual display, for example they could be given a bounding box drawn in a different color.

It would also be great to be able to click a button and have the disk space image redraw with the "cloud only" files omitted, since they are not using any real disk space.

I asked Microsoft Copilot how to detect "cloud only files" and it answered:

Detecting Online-Only Files: To determine whether a file is “online only” (i.e., stored in the cloud but not taking up local disk space), you can use the Win32 API. Specifically, call GetFileAttributes() and check if the FILE_ATTRIBUTE_OFFLINE attribute is set. This attribute has existed for a long time and applies to OneDrive files as well2.

Thank you for maintaining such a wonderfully helpful tool. Best regards,

Guy

NoMoreFood commented 7 months ago

So I'm torn on this one. I'm supportive of making the change but would we want a customizable color for offline files or an option to completely omit them (or maybe any omit file below a certain size)? I don't know if I want to do both.

assarbad commented 7 months ago

There are various levels of hydration (Microsoft's term) for cloud files and a whole API. That's the technical term.

There is RtlIsCloudFilesPlaceholder and similar functions (incl. RtlIsPartialPlaceholder). These are even trivial to reimplement, if you wanted to. Either way, these would go nicely together with the current implementation you made for file/directory enumeration, @NoMoreFood. Those functions to the best of my knowledge were introduced in Windows 8 or 10. They're meant to also cater third-party vendors (think DropBox).

IIRC the value of ReparseTag is synthesized by the file system driver, so it would not even be possible to read that straight from the MFT (although there is probably some way). But it made sense for MS to reuse the reparse mechanism which exists since Windows 2000 and is generally pretty extensible.

One gotcha I remember from working in the AV industry was that you wanted to handle fully dehydrated or partially hydrated files properly to avoid hydrating them unnecessarily. IIRC Microsoft also documented some APIs inside its MVI program, but I don't recall the details. Chances are the above are/were also part of it. In the WDK you can find several functions and structs tagged as "MVI" ... going through that list should provide a clue.

PS: official API is this one

GuyLarri commented 7 months ago

As a WinDirStat user, the reason for firing up WinDirStat, in my personal experience, is always the same: I am running low on disk space and I want to know what I can delete, archive, or mark as "cloud only" ("free up space" in the file/folder dropdown menu in Windows Explorer). [I also use fdupes under cygwin to scan for duplicate files, which would be another great WinDirStat future feature, as it aligned with what I understand to be WinDirStat's main use case - "How can I get my disk space back?"].

If I am looking in my OneDrive or Dropbox folder, my question is: What here isn't already "cloud only", so I am interested in the UI making it easy for me to find the files that are taking up real disk space, vs the files that are not taking up real disk space. One solution would be to mark the rectangles of the "cloud only" files with a black border, so the center color still tells me the file type, or make the rectangle completely black. That way the files that are taking up disk space would jump out visually amongst the set of all files in my OneDrive or Dropbox folder.

However, if I want to visually judge how much of my disk drive is taken up by those files that are not "cloud only", including files outside of OneDrive and Dropbox sync trees, I would like to see a visual view of the entire disk drive with the "cloud only" rectangles completely omitted, because they are distorting my understanding of where disk space is actually being used. So I'd like to be able to click on an "omit online only files" button or check-box.

I think both options have their own value, but if I could only choose one, I would choose the second one (i.e. I would choose the "omit online only files" button/checkbox), because the fundamental idea of the treemap visual is that I can quickly see where disk space is being used, with each rectangle proportional to file size, and the "omit online files" option restores the treemap visual to its original purpose of showing me where disk space is being used.

Thank you for reading, and thank you for maintaining an amazing and very very useful tool.

assarbad commented 6 months ago

As a WinDirStat user, the reason for firing up WinDirStat, in my personal experience, is always the same: I am running low on disk space and I want to know what I can delete, archive, or mark as "cloud only" ("free up space" in the file/folder dropdown menu in Windows Explorer). [I also use fdupes under cygwin to scan for duplicate files, which would be another great WinDirStat future feature, as it aligned with what I understand to be WinDirStat's main use case - "How can I get my disk space back?"].

@GuyLarri Have you seen Czkawka? fclones is another I have used and value. IIRC both can be installed via winget. There are many other tools out there, but the best bet is probably using ReFS (unfortunately Microsoft makes it unduly hard to create and maintain those volumes on consumer editions of Windows) for something like deduplication. Hardlinking files is a nice way to get back some space as well, but it comes with its own pitfalls (can destroy metadata of the clones that are being hardlinked "away"); for example never use it inside VCS working copies.

If I am looking in my OneDrive or Dropbox folder, my question is: What here isn't already "cloud only", so I am interested in the UI making it easy for me to find the files that are taking up real disk space, vs the files that are not taking up real disk space.

Generally makes sense. They take up some space, but usually just for metadata.

One solution would be to mark the rectangles of the "cloud only" files with a black border, so the center color still tells me the file type, or make the rectangle completely black. That way the files that are taking up disk space would jump out visually amongst the set of all files in my OneDrive or Dropbox folder.

However, if I want to visually judge how much of my disk drive is taken up by those files that are not "cloud only", including files outside of OneDrive and Dropbox sync trees, I would like to see a visual view of the entire disk drive with the "cloud only" rectangles completely omitted, because they are distorting my understanding of where disk space is actually being used. So I'd like to be able to click on an "omit online only files" button or check-box.

Yep, but that would be a similar topic to compressed or sparse files or link counts higher than 1, essentially. How do you visualize these? Compressed is okayish. Sparse depends already and link counts higher than 1 caused wrinkles in my brains.

Such an option shouldn't be hidden somewhere deep in the settings, however, but near the treemap. There is no single way of displaying it in a way that will cater all needs. So quickly toggling these makes sense.

DaveG-SDX commented 2 weeks ago

I visited this forum seeking a solution to a similar issue. I primarily use WinDirStat to assess my actual disk usage so that I can free up some space. Generally, I am interested in the current space utilization at the time of launching the program. It would be beneficial to have a feature that filters out OneDrive cloud files to reflect actual disk usage. However, if I also want to consider the potential disk usage including my OneDrive files, toggling this filter back and forth would aid in effective planning.

When right-clicking any file within a OneDrive folder and selecting properties, there are entries for "Size" and "Size on disk," which indicate the actual space occupied on the disk. Microsoft describes various states of file hydration, but only two states are pertinent to this discussion: How much disk space is currently consumed on my local disk? Files represented by the cloud icon are not locally downloaded and thus should show occupying disk space during WinDirStat scans. Conversely, files with a solid green icon or a green circle with check marks are downloaded and consume local disk space. The latter state indicates files that will be automatically removed locally and retained in the cloud after 30 days of inactivity. Regardless, it is essential to know these files' current local disk space consumption when running WinDirStat.

Including these icons in WinDirStat would be advantageous, offering users visual cues to distinguish between OneDrive files and those stored locally.