scullionw / dirstat-rs

(fastest?) disk usage cli, similar to windirstat.
MIT License
157 stars 11 forks source link

Deal with max path size on Windows #5

Open Mart-Bogdan opened 2 years ago

Mart-Bogdan commented 2 years ago

On windows maximum length for a path is MAX_PATH, which is defined as 260 characters.

There are some ways to define path in a specific way, so it would allow 32,767 characters is approximate.

It could work with our API call on Windows 10, Version 1607, and Later if Registry key is configured, or APP manifest provided.

I assume it's better to stick with "\\?\" prefix and support a wider range of Windows versions.

links:

Creating this issue as I've found TODO in code.

Mart-Bogdan commented 2 years ago

I think the hard part would be to test and check all corner cases.

Perhaps I'll be able to take this issue but can't guarantee.

Mart-Bogdan commented 1 year ago

I'm going to finish PR soon. I've got time to dig into Rust.

My initial take was to fix filename generation, and I've done it. But now I see that it can be achieved using GetFileInformationByHandleEx , as it accepts HANDLE and gives various info form file (we already are using similar funtion for getting information on regular size). Using handles must be marginally faster.

Also I've found other issues with apparent size on Windows platform: it shows size inconsistent with windows explorer for small files. Windows explorer shows 0, which means it's stored inside parent directory, but current solution shows size. And also we don't calculate size of alternate file streams, but that calculation can be costly on performance.

I think that inconsitencies can be addressed later.

P.S. source material: https://devblogs.microsoft.com/oldnewthing/20160427-00/?p=93365

Mart-Bogdan commented 1 year ago

Fun fact. Comporssed size reports incorrectly: image

File: test-data\b4000_rand_c
 FILE_STANDARD_INFO { AllocationSize: 8192, EndOfFile: 4000, NumberOfLinks: 1, DeletePending: 0, Directory: 0 }
 FILE_COMPRESSION_INFO { CompressedFileSize: 4000, CompressionFormat: 2, CompressionUnitShift: 16, ChunkShift: 12, ClusterShift: 12, Reserved: [0, 0, 0] }
 GetCompressedFileSizeW: 4000

AllocationSize struct is most reliable (when calling GetFileInformationByHandleEx)

but it gives minuscule discrepancy on small files

 File: test-data\b23_rand_c
 FILE_STANDARD_INFO { AllocationSize: 24, EndOfFile: 23, NumberOfLinks: 1, DeletePending: 0, Directory: 0 }
 FILE_COMPRESSION_INFO { CompressedFileSize: 23, CompressionFormat: 2, CompressionUnitShift: 16, ChunkShift: 12, ClusterShift: 12, Reserved: [0, 0, 0] }
 comp_size: 23

24 bytes instead of 23. But I think it's not a big deal.