jessek / hashdeep

Other
709 stars 132 forks source link

Windows problems with long filenames #63

Open simsong opened 11 years ago

simsong commented 11 years ago

Converted from SourceForge issue 2903171, submitted by jessekornblum

The Win32 versions of the programs have trouble with very long paths. See http://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html for background. The problem appears to be the PATH_MAX constant and the getcwd call.

simsong commented 11 years ago

Submitted by jessekornblum

Apparently Win32 programs cannot use paths longer than 260 characters. This is a limitation of Microsoft Windows, not md5deep. Sorry, but I can't fix this.

simsong commented 11 years ago

Submitted by ctrodgers

Hi.

I have also noticed problems with the native windows version of hashdeep and long filenames.

The errors I see are, for example: "C:\Users\crodgers\Desktop\Aug2010_CleanedUp\25June2010\Scopus - 20 documents that cite Correcting human heart 31P NMR spectra for partial saturation. Evidence that saturation factors for PCr ATP are homogeneous in normal and disease states_files\infobubble-: No such file or directory".

In Windows Explorer, I can see a file in that folder called infobuble-arrow.gif and have no problems accessing it.

I think that hashdeep needs to use the Unicode windows library calls to overcome this bug.

See e.g. http://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx

"Maximum Path Length Limitation

In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 characters. A local path is structured in the following order: drive letter, colon, backslash, name components separated by backslashes, and a terminating null character. For example, the maximum path on drive D is "D:<some 256-character path string>" where "" represents the invisible terminating null character for the current system codepage. (The characters < > are used here for visual clarity and cannot be part of a valid path string.)

Note File I/O functions in the Windows API convert "/" to "\" as part of converting the name to an NT-style name, except when using the "\?\" prefix as detailed in the following sections.

The Windows API has many functions that also have Unicode versions to permit an extended-length path for a maximum total path length of 32,767 characters. This type of path is composed of components separated by backslashes, each up to the value returned in the lpMaximumComponentLength parameter of the GetVolumeInformation function (this value is commonly 255 characters). To specify an extended-length path, use the "\?\" prefix. For example, "\?\D:". (The characters < > are used here for visual clarity and cannot be part of a valid path string.)

Note The maximum path of 32,767 characters is approximate, because the "\?\" prefix may be expanded to a longer string by the system at run time, and this expansion applies to the total length.

The "\?\" prefix can also be used with paths constructed according to the universal naming convention (UNC). To specify such a path using UNC, use the "\?\UNC\" prefix. For example, "\?\UNC\server\share", where "server" is the name of the computer and "share" is the name of the shared folder. These prefixes are not used as part of the path itself. They indicate that the path should be passed to the system with minimal modification, which means that you cannot use forward slashes to represent path separators, or a period to represent the current directory, or double dots to represent the parent directory. Because you cannot use the "\?\" prefix with a relative path, relative paths are always limited to a total of MAX_PATH characters.

There is no need to perform any Unicode normalization on path and file name strings for use by the Windows file I/O API functions because the file system treats path and file names as an opaque sequence of WCHARs. Any normalizations your application requires should be performed with this in mind, external of any calls to related Windows file I/O API functions.

When using an API to create a directory, the specified path cannot be so long that you cannot append an 8.3 file name (that is, the directory name cannot exceed MAX_PATH minus 12).

The shell and the file system have different requirements. It is possible to create a path with the Windows API that the shell user interface might not be able to interpret properly."

simsong commented 11 years ago

Submitted by jessekornblum

Thanks for the detailed information. md5deep already uses the WIndows API to deal with wildcards, but not parsing directories. I will do some experimenting and let you know how it goes. Would you be willing to test out some new versions?

simsong commented 11 years ago

Submitted by ctrodgers

I'd be happy to test things for you.

simsong commented 11 years ago

Submitted by nobody

QL5Whl lshjiibstgee, [url=http://fslnaqaraziw.com/]fslnaqaraziw[/url], [link=http://unbvaehsmqln.com/]unbvaehsmqln[/link], http://skcxovrewabe.com/

simsong commented 11 years ago

Submitted by nobody

WsUtVD pibfkekynynk, [url=http://ptzdmcaatpve.com/]ptzdmcaatpve[/url], [link=http://fsgmbmxwvhjd.com/]fsgmbmxwvhjd[/link], http://gutkxyulhnvy.com/

simsong commented 11 years ago

Submitted by nobody

WsUtVD pibfkekynynk, [url=http://ptzdmcaatpve.com/]ptzdmcaatpve[/url], [link=http://fsgmbmxwvhjd.com/]fsgmbmxwvhjd[/link], http://gutkxyulhnvy.com/

simsong commented 11 years ago

Submitted by nobody

WsUtVD pibfkekynynk, [url=http://ptzdmcaatpve.com/]ptzdmcaatpve[/url], [link=http://fsgmbmxwvhjd.com/]fsgmbmxwvhjd[/link], http://gutkxyulhnvy.com/

thezoggy commented 11 years ago

just ran into this myself. odd thing is that on windows hashdeep64 v4.3 generated the hashes without problem (maybe its because I used -b flag to discard the path) which I dumped to a file. then using that file of hashes to verify the file on the remote destination matches (running the command on the same windows box.. but the actual content is stored on an unraid (slackware) machine remotely. it says that it wasnt found for two of the files..but it is in the file and in both locations.. the only thing that stands out is the length of the folder+file (264 characters long for the whole thing.. or 213 for the filename alone).

doing:

D:\md5deep-4.3>hashdeep64.exe -xk .\source-c.txt -rb \\tower\Music\Music\C\

results in:

\\tower\Music\Music\C\Charles Wright and the Watts\Charles_Wright_and_the_Watts_103rd_Street_Rhythm_Band_-_Express_Yourself-CDM-2005-LGU\03_charles_wright_and_the_watts_103rd_street_rhythm_band_-_expr
ess_yourself_(philip_steirs_everybody_on_the_phloor_mix)-lgu.mp3: No such file or directory
\\tower\Music\Music\C\Charles Wright and the Watts\Charles_Wright_and_the_Watts_103rd_Street_Rhythm_Band_-_Express_Yourself-CDM-2005-LGU\04_charles_wright_and_the_watts_103rd_street_rhythm_band_-_expr
ess_yourself_(supreme_beings_of_leisure_do_it_right_mix)-lgu.mp3: No such file or directory

from source-c.txt :

9917850,161bdb67c07b1eedb3eb5b76c636cf3f,4093281c3b5c8ae27d5e3ff692738c8a02073c2a6558930db4eb2a89f4d9d6f7,03_charles_wright_and_the_watts_103rd_street_rhythm_band_-_express_yourself_(philip_steirs_everybody_on_the_phloor_mix)-lgu.mp3

validated that the file does exist..

/mnt/disk1/Music/Music/C/Charles Wright and the Watts/Charles_Wright_and_the_Watts_103rd_Street_Rhythm_Band_-_Express_Yourself-CDM-2005-LGU# ls -alh | grep 03_
-rw-rw-rw- 1 unraid users 9.5M Sep 24  2005 03_charles_wright_and_the_watts_103rd_street_rhythm_band_-_express_yourself_(philip_steirs_everybody_on_the_phloor_mix)-lgu.mp3
simsong commented 11 years ago

I'm confused. Can you explain this a bit more?

On Apr 8, 2013, at 9:30 PM, thezoggy notifications@github.com wrote:

just ran into this myself. odd thing is that on windows hashdeep64 v4.3 generated the hashes without problem (maybe its because I used -b flag to discard the path) which I dumped to a file. then using that file of hashes to verify the file on the remote destination matches (running the command on the same windows box.. but the actual content is stored on an unraid (slackware) machine remotely. it says that it wasnt found for two of the files..but it is in the file and in both locations.. the only thing that stands out is the length of the folder+file (264 characters long for the whole thing.. or 213 for the filename alone).

doing: D:\md5deep-4.3>hashdeep64.exe -xk .\source-c.txt -rb \tower\Music\Music\C\

results in:

\tower\Music\Music\C\Charles Wright and the Watts\Charles_Wright_and_the_Watts_103rd_Street_RhythmBand-_Express_Yourself-CDM-2005-LGU\03_charles_wright_and_the_watts_103rd_street_rhythmband-_expr essyourself(philip_steirs_everybody_on_the_phloor_mix)-lgu.mp3: No such file or directory \tower\Music\Music\C\Charles Wright and the Watts\Charles_Wright_and_the_Watts_103rd_Street_RhythmBand-_Express_Yourself-CDM-2005-LGU\04_charles_wright_and_the_watts_103rd_street_rhythmband-_expr essyourself(supreme_beings_of_leisure_do_it_right_mix)-lgu.mp3: No such file or directory from source-c.txt :

validated that the file does exist.. :/mnt/disk1/Music/Music/C/Charles Wright and the Watts/Charles_Wright_and_the_Watts_103rd_Street_RhythmBand-_ExpressYourself-CDM-2005-LGU# ls -alh | grep 03 -rw-rw-rw- 1 unraid users 9.5M Sep 24 2005 03_charles_wright_and_the_watts_103rd_street_rhythmband-_expressyourself(philip_steirs_everybody_on_the_phloor_mix)-lgu.mp3

— Reply to this email directly or view it on GitHub.

thezoggy commented 11 years ago

mapped \tower\Music\Music\ to a network drive Z: and re-ran the test.

D:\md5deep-4.3>hashdeep64.exe -xk .\source-c.txt -rb Z:\C\

this time everything passed. so it's definitely the total length and not just the filename. so i guess even though i'm on win7 and using hashdeep64 i'm still limited to 260 characters for the path+filename for the comparison

jessek commented 11 years ago

Hi Simson,

The best explanation is at http://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx under "Maximum Path Length Limitation".

Long story short: To fix this we would need to re-write the Win32 directory traversal code to use the Unicode versions of functions.

Jesse Kornblum jessekornblum@gmail.com

simsong commented 11 years ago

I have a rewritten version of the win32 traversal code that uses the Unicode versions of the functions. It's in C++. Would you like a copy?

On Apr 8, 2013, at 11:34 PM, Jesse Kornblum notifications@github.com wrote:

Hi Simson,

The best explanation is at http://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx under "Maximum Path Length Limitation".

Long story short: To fix this we would need to re-write the Win32 directory traversal code to use the Unicode versions of functions.

Jesse Kornblum jessekornblum@gmail.com — Reply to this email directly or view it on GitHub.

jessek commented 11 years ago

Yes, please.

On Tue, Apr 9, 2013 at 4:22 AM, Simson L. Garfinkel < notifications@github.com> wrote:

I have a rewritten version of the win32 traversal code that uses the Unicode versions of the functions. It's in C++. Would you like a copy?

On Apr 8, 2013, at 11:34 PM, Jesse Kornblum notifications@github.com wrote:

Hi Simson,

The best explanation is at http://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx under "Maximum Path Length Limitation".

Long story short: To fix this we would need to re-write the Win32 directory traversal code to use the Unicode versions of functions.

Jesse Kornblum jessekornblum@gmail.com — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/jessek/hashdeep/issues/63#issuecomment-16107058 .

Jesse Kornblum jessekornblum@gmail.com

simsong commented 11 years ago

Here's what I use for bulk_extractor. I don't know how it compares with what you have. Actually ,it may not be using the Unicode versions. What are those APIs?

On Apr 9, 2013, at 9:21 AM, Jesse Kornblum notifications@github.com wrote:

Yes, please.

On Tue, Apr 9, 2013 at 4:22 AM, Simson L. Garfinkel < notifications@github.com> wrote:

I have a rewritten version of the win32 traversal code that uses the Unicode versions of the functions. It's in C++. Would you like a copy?

On Apr 8, 2013, at 11:34 PM, Jesse Kornblum notifications@github.com wrote:

Hi Simson,

The best explanation is at http://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx under "Maximum Path Length Limitation".

Long story short: To fix this we would need to re-write the Win32 directory traversal code to use the Unicode versions of functions.

Jesse Kornblum jessekornblum@gmail.com — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/jessek/hashdeep/issues/63#issuecomment-16107058 .

Jesse Kornblum jessekornblum@gmail.com — Reply to this email directly or view it on GitHub.

govin commented 11 years ago

Can you give me a copy of the win32 traversal code patch as well? or upload it somewhere?

jessek commented 11 years ago

Although not directly applicable to the Hashdeep code base, I have modified the ssdeep Win32 directory traversal code to use long paths. It's still in beta mode, but has potential. The short story is that it will probably be easier to make a separate code path for Windows directory traversal. See the file dig.cpp in ssdeep 2.10 beta1 for details, http://ssdeep.sf.net/.

richardkmichael commented 10 years ago

I also have [what is likely] a long filename problem on Win7 with hashdeep 4.3. During hash computation, hashdeep reports "No such file or directory", however the files do exist in the indicated locations.

The full paths are a long series of directories, followed by a "9.4" filename (that is, the final filename is themedata.thmx = "9.4").

The total full path name, including directories but not including the drive letter is 238 characters, e.g. "Dir1\Dir2\Dir3...\themedata.thmx" = 238.

The directory names contain spaces, parentheses ((, )) and apostrophe's ('), e.g. A directory\Foo (bar)\Mary's Files. I'm unsure if escaping these characters is required in Windows; and, if required, whether the escaping will count as two characters, e.g. A\ directory\Foo \(bar\)Mary\'s Files in the Windows API.

Finally, the drive letter with it's syntax (E:\) adds another three characters.

So, I'm assuming either:

Has any work been done toward this in a forthcoming version of hashdeep? I am happy to test.

Aside, the FastCopy utility has handled the long filenames correctly. Perhaps it's worth a look at laurent22/fastcopy to see what it does? (Note, the GitHub project is FastCopy v2.08, it's at v2.11 now, but there are no fixes since v2.08 related to final handling. The full v2.11 source is available on the FastCopy website also linked.)

Thanks!

simsong commented 10 years ago

There are no hard coded limits at this point to the best of my knowledge. Can you provide a very simple example which causes the problem? Thanks.

On Nov 27, 2013, at 6:55 PM, Richard Michael notifications@github.com wrote:

I also have [what is likely] a long filename problem on Win7 with hashdeep 4.3. During hash computation, hashdeep reports "No such file or directory", however the files do exist in the indicated locations.

The full paths are a long series of directories, followed by a "9.4" filename (that is, the final filename is themedata.thmx = "9.4").

The total full path name, including directories but not including the drive letter is 238 characters, e.g. "Dir1\Dir2\Dir3...\themedata.thmx" = 238.

The directory names contain spaces, parentheses ((, )) and apostrophe's ('), e.g. A directory\Foo (bar)\Mary's Files. I'm unsure if escaping these characters is required in Windows; and, if required, whether the escaping will count as two characters, e.g. A\ directory\Foo (bar)Mary\'s Files in the Windows API.

Finally, the drive letter with it's syntax (E:) adds another three characters.

So, I'm assuming either:

all that is adding up to more than 255, 260 or whatever the limit is; or, the parentheses or apostrophes are confusing hashdeep Has any work been done toward this in a forthcoming version of hashdeep? I am happy to test.

Thanks!

— Reply to this email directly or view it on GitHub.

richardkmichael commented 10 years ago

Sure. Running hashdeep (in Win7/64bit/NTFS) on this long filename results in No such file or directory:

E:\Data - Common Shares> c:\users\user\Downloads\md5deep-4.3\md5deep-4.3\hashdeep64.exe -v -l "c:\Data - Common Shares\General\Business Accounting - AA, BB and CC Tracking\Business Accounting - AA, BB and CC Tracking - OLD back up do not use\US Remittance & Vendor Relations\ABCDE US\ABCDE US Fiscal 2007\Vendor Relations 2007\Jan '70\Proof Of Shipment 01-01-70 Onwards AAA Sent 1-1-70.txt"

c:\Data - Common Shares\General\Business Accounting - AA, BB and CC Tracking\Business Accounting - AA, BB and CC Tracking - OLD back up do not use\US Remittance & Vendor Relations\ABCDE US\ABCDE US Fiscal 2007\Vendor Relations 2007\Jan '70\Proof Of Shipment 01-01-70 Onwards AAA Sent 1-1-70.txt: No such file or directory

Works with this slightly shorter name:

E:\Data - Common Shares>c:\users\user\Downloads\md5deep-4.3\md5deep-4.3\hashdeep64.exe -v -l "c:\Data - Common Shares\General\Business Accounting - AA, BB and CC Tracking\Business Accounting - AA, BB and CC Tracking - OLD back up do not use\US Remittance & Vendor Relations\ABCDE US\ABCDE US Fiscal 2007\Vendor Relations 2007\Jan '70\Proof Of Shipme.txt"

%%%% HASHDEEP-1.0
%%%% size,md5,sha256,filename
## Invoked from: E:\Data - Common Shares
## E:\Data - Common Shares> c:\users\user\Downloads\md5deep-4.3\md5deep-4.3\hashdeep64.exe -v -l c:\Data - Common Shares\General\Business Accounting - AA, BB and CC Tracking\Business Accounting - AA, BB and CC Tracking - OLD back up do not use\US Remittance & Vendor Relations\ABCDE US\ABCDE US Fiscal 2007\Vendor Relations 2007\Jan '70\Proof Of Shipme.txt
##
0,d41d8cd98f00b204e9800998ecf8427e,e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855,c:\Data - Common Shares\General\Business Accounting - AA, BB and CC Tracking\Business Accounting - AA, BB and CC Tracking - OLD back up do not use\US Remittance & Vendor Relations\ABCDE US\ABCDE US Fiscal 2007\Vendor Relations 2007\Jan '70\Proof Of Shipme.txt

Note, Windows Explorer cannot create the long filename - I was able to create all the parent directories, but the longest named file (leaf) I could create (right-click, "New", "Text File") in the final directory was "Proof Of Shipme.txt". This corresponds to the longest name hashdeep itself can handle. If I create a file in NotePad, I can save it with the longer name. This is a known limitation of Windows Explorer's MAX_PATH.

Could clean_win32_name add the \\?\ prefix to allow 32K filenames? Using `\?\' might make Unicode handling easier. (I have another Unicode-related bug to file -- when printing escaped filenames to the console.)

This would probably impact the -l (relative filenames) option, because hashdeep would need to make the filename relative again before writing it to the output. It might affect other areas as well, because \\?\ prohibits "." and ".." in the path.

Also, I'm not sure how this would interact with the DASD done by is_win32_device_file in dig_win32. (Why is direct device access necessary?)

IanWorthington commented 10 years ago

Was a test version of this ever made available?

richardkmichael commented 10 years ago

@IanWorthington Test version of what, a fix in the hashdeep code or a repro of my description? If you create the paths (dirs/filenames) I've described, I assume it's replicable behaviour. Perhaps I can create a test repo of files, if that's what you're after?

IronHand28 commented 8 years ago

This issue is ingrained in the OS of the system we are using. It is designed like that but recently Microsoft removed the cap limit of characters. However, if you still are facing this trouble, I highly recommend this tool that I and my team are using - GS RichCopy 360. It is capable of overcoming this restriction and is designed to transfer HUGE amount of files. Read more here - http://www.gurusquad.com/blog/

rbeede commented 7 years ago

I've tested with 64-bit builds with the same result of any path > 260 not working.

Version 4.4