sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
968 stars 219 forks source link

Aborting InvalidPathException when processing mounted folders with invalid file names #1861

Closed lfcnassif closed 1 year ago

lfcnassif commented 1 year ago

Reported by an user: Screenshot_20230901-195840_WhatsApp

We should log and skip the file at least, if not possible to access its contents using an alternative way.

patrickdalla commented 1 year ago

It seems windows does not allow spaces at the end of folder names, but somehow some low level modifications are done by some apps. Windows Explorer and cmd trims any end spaces. And java implementation checks and throws an exceptions if such trailing space is found in the name of a folder.

patrickdalla commented 1 year ago

In this thread the user says he was able to create such a folder using PHP: https://superuser.com/questions/565334/rename-delete-windows-x64-folder-with-leading-and-trailing-space

lfcnassif commented 1 year ago

Correct! Thank you @patrickdalla for pointing a procedure to reproduce!

patrickdalla commented 1 year ago

I tried to find a way to extend NIO Windows FileSystem classes to skip this enforcement, but its internal implementation aren't visible. We should make complete new NIO FileSystem implementation, maybe simply copying and pasting the openjdk (if license permits), if we must keep this NIO conformance. Otherwise, we should review and change all code that depends on the resolve method of the Path implementation WindowsPath.

As IPED is a forensic tool, I think one of these implementations necessary.

lfcnassif commented 1 year ago

I tried to find a way to extend NIO Windows FileSystem classes to skip this enforcement, but its internal implementation aren't visible. We should make complete new NIO FileSystem implementation, maybe simply copying and pasting the openjdk (if license permits), if we must keep this NIO conformance. Otherwise, we should review and change all code that depends on the resolve method of the Path implementation WindowsPath.

I think this would be an overkill... My idea was simply to try something like the approach used in #1170. If it doesn't work, print an explicit error in the Console about the inaccessible file and skip its contents.

patrickdalla commented 1 year ago

This seems to work only if the NIO interface is not used. Once needed to convert to a Path, the inconsistent char will be checked.

patrickdalla commented 1 year ago

All the usages of FileInputStreamFactory getPath immediately converts the Path object to a File object. So, it seems no necessary to use this NIO interface.

lfcnassif commented 1 year ago

All the usages of FileInputStreamFactory getPath immediately converts the Path object to a File object. So, it seems not necessary to use this NIO interface.

If it works around the issue, I agree to change the API. @patrickdalla are you going to implement a fix or should I take a look at this issue?

patrickdalla commented 1 year ago

It worked changing removing Path dependency in FileInputStreamFactory. @lfcnassif , I Pushed the commit. Review if it affects other places.

patrickdalla commented 1 year ago

I've created a test case with a folder with a trailing space and some files inside. Without the commit applied, I could repeat the error. With the commit, the workaround works.

patrickdalla commented 1 year ago

@lfcnassif you could continue from now on? I've made some tests on Linux too, to make sure it wouldn't introduce any regression, and it passed. But maybe you have a greater test base to validate, and if some error is found, you can correct. As you have chosen to use the Path NIO API, maybe there is some case that it should be used.

patrickdalla commented 1 year ago

The branch is "OvercomPathWithTrailingSpaces".

patrickdalla commented 1 year ago

The tests I've made so far, on linux and windows, passed.

lfcnassif commented 1 year ago

@lfcnassif you could continue from now on?

Sure! Could you zip your test files and send to me? Maybe the trailing spaces will be kept into the zip and after I unzip them.

patrickdalla commented 1 year ago

I sent you throught teams. I couldn't find a way to share here.

lfcnassif commented 1 year ago

I sent you throught teams. I couldn't find a way to share here.

Thank you! Although unzipping it didn't work, file names were fixed... But I managed to create a folder and a file ending with space into NTFS from cmd line (using the \\?\ syntax). Unfortunately I didn't manage to reproduce the error, files are skipped in FolderTreeReader:

Tue Sep 19 23:40:06 BRT 2023    [WARN]  File ignored: f:\teste-files\spaces\dir\b.txt  java.nio.file.NoSuchFileException: f:\teste-files\spaces\dir\b.txt 
Tue Sep 19 23:40:06 BRT 2023    [WARN]  File ignored: f:\teste-files\spaces\dir\b.txt  java.nio.file.NoSuchFileException: f:\teste-files\spaces\dir\b.txt 
Tue Sep 19 23:40:06 BRT 2023    [WARN]  File ignored: f:\teste-files\spaces\mydirectory  java.nio.file.NoSuchFileException: f:\teste-files\spaces\mydirectory 
Tue Sep 19 23:40:06 BRT 2023    [WARN]  File ignored: f:\teste-files\spaces\mydirectory  java.nio.file.NoSuchFileException: f:\teste-files\spaces\mydirectory 

What steps have you done to reproduce?

patrickdalla commented 1 year ago

I have entered into Windows PowerShell to create the files with the "\?\" start syntax. And later I have run IPED snapshot with the following command:

image

patrickdalla commented 1 year ago

I had the same result now. I thought I saw it on IpedSearch app, but I think I was wrong, sorry. Anyway, it is already the secondary objective, skip the file :-). I saw that FileTreeReader uses the NIO interface. I will have to review it too.

patrickdalla commented 1 year ago

The FileTreeReader throws error when reading file attributes, that uses NIO Path object, like creation and last modified dates. I will search if there is an alternate way without NIO.

lfcnassif commented 1 year ago

A lonnng time ago we used the old File.listFiles() to walk the file tree. We changed to the new Files.walkFileTree() because it seems to give better error control, access to more file attributes and could be faster (although I don't remember if I compared the performance...).

patrickdalla commented 1 year ago

Unfortunately I could not find a way to read the files attributes without the NIO Path interface to the file, and the internal (unaccessible) implementation of it checks the existence of trailing spaces.

patrickdalla commented 1 year ago

We have the option to skip only these attributes (an test if Path NIO isn't needed in other IPED class). Or make the complete Windows File System NIO implementation Fork.

lfcnassif commented 1 year ago

I'm still not able to reproduce this... Have you changed default IPED JRE 11 bundled version?

patrickdalla commented 1 year ago

I'm realy not using the bundled jre. I will check exactly which version i am using tomorrow

Em qua., 20 de set. de 2023 16:27, Luis Filipe Nassif < @.***> escreveu:

I'm still not able to reproduce this... Have you changed default IPED JRE 11 bundled version?

— Reply to this email directly, view it on GitHub https://github.com/sepinf-inc/IPED/issues/1861#issuecomment-1728380461, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG247S26Q2T3PWRJYPI77Z3X3NGUTANCNFSM6AAAAAA4IEWJHE . You are receiving this because you were assigned.Message ID: @.***>

patrickdalla commented 1 year ago

Correcting, as you can see in screenshot before, I used the iped.exe command, so it uses the bundled JVM.

patrickdalla commented 1 year ago

I could repeat the apparent successful execution that made me think the folders with trailing name were correctly processed at first tests. What happened was that I had 2 subfolder in the main case folder, one with the name "teste" an the other "teste ", the last with trailing space in name. So, when the last folder it is passed to FileVisitor it is converted to a Path nio object. In file visitor, the no file name check is done, but the name is passed to some low level code that reads the attributes of the file (maybe windows api). This low level code check only if the file exists, but it seems that it removes trailing spaces. So, as a folder with same name but without the trailing space exists, the check pass, and it does not gives any exception. The final result is that the content of the folder "teste ", with trailing space name, was processed, although its specific metadata attributes were wrongly filled with the attributes of the folder "teste" without the trailing space. image

patrickdalla commented 1 year ago

So, the folder existence check passed, as there is another folder with same name without trailing spaces, but the attributes of the wrong folder is read. And if it not the case, the folder existence check does not pass when retrieving file attributes, throwing an exception and skipping the file.

lfcnassif commented 1 year ago

Hi @patrickdalla, are you still able to reproduce this abortion issue? I'm going to make changes to your PR and I would appreciate if you could test them, since I didn't manage to reproduce this.

patrickdalla commented 1 year ago

The error occurred in other point. It seems whenever we have to convert it back to Path NIO, it will enforce the name with trailing space restriction.

image

patrickdalla commented 1 year ago

The last error only happens when there is a folder in the same path of the trailing space named folder with the same name but without the trailing space, as FileVisitor pass the attributes read, thou reading them from the wrong folder.

When there is no other folder with same name but without the trailing space on the same path, the error happens in FileVisitor, which skips the folder and continues the processing of remainder items, logging it. image

lfcnassif commented 1 year ago

The last error only happens when there is a folder in the same path of the trailing space named folder with the same name but without the trailing space, as FileVisitor pass the attributes read, thou reading them from the wrong folder.

When there is no other folder with same name but without the trailing space on the same path, the error happens in FileVisitor, which skips the folder and continues the processing of remainder items, logging it.

Great! Now I was able to reproduce the error, thanks for the tip! I'll fix the commit I pushed.

lfcnassif commented 1 year ago

The error occurred in other point. It seems whenever we have to convert it back to Path NIO, it will enforce the name with trailing space restriction.

image

From what commit did you get this error? This specific one I didn't reproduce.

patrickdalla commented 1 year ago

From the last commit you pushed just before. What happened:

1)File visitor worked because low level FS functions to load file attributes checked the file existence removing the trailing space. As there was another folder that matched this check, it recovered the attributes from this other folder and continued to process its children. 2)As fileinputstream reader patch does use alternatively the File class if Path leads to exception, the error does not occurs there anymore. 3) But later, when calling tikainputstream.get in ParsingTask, it requires a NIO path instance, and name restriction is enforced in this later moment. The restriction will be enforced whenever a Path instance is to be required.

Em qui., 21 de set. de 2023 16:51, Luis Filipe Nassif < @.***> escreveu:

The error occurred in other point. It seems whenever we have to convert it back to Path NIO, it will enforce the name with trailing space restriction.

[image: image] https://user-images.githubusercontent.com/28692427/269736377-51d3b9af-a3e3-4581-a321-7e8e7b07b08a.png

From what commit did you get this error? This specific one I didn't reproduce.

— Reply to this email directly, view it on GitHub https://github.com/sepinf-inc/IPED/issues/1861#issuecomment-1730282258, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG247S5MSIO4BLEJVYVBGSLX3SSEHANCNFSM6AAAAAA4IEWJHE . You are receiving this because you were mentioned.Message ID: @.***>

lfcnassif commented 1 year ago

I improved my test data and was able to reproduce, thanks.

patrickdalla commented 1 year ago

The result for the case where there isn't another folder in same path with same name without the trailing space:

image

patrickdalla commented 1 year ago

The result for the case where there is another folder in same path with same name without the trailing space:

IPED didn't finished
Folder and all its files weren't processed.
Error:

image

patrickdalla commented 1 year ago

It seems it tries to copy the input stream of the folder. Maybe we have to skip the temp file creation if the file is a folder. I will test.

patrickdalla commented 1 year ago

I noticed you have put back previous API that uses Path NIO, so, it fail. I reverted it again, and put some other code:

It worked for case where there is a folder with same name.

For the case without a folder with same name, the implementation fo FileTreeWalker has a "flaw": if it cannot read the attributes, it returns ENTRY, even thou it is a directory and should return START_DIRECTORY. So, it simply ignores the folder and all its subdirectories. FileTreeWalker is dependent of a well implemented NIO fs. Subsequent tree nodes on tree structure becomes out of order.

image

In the first case, as it can read the attributes (although from the wrong folder), it continues as a Folder, and process its children. If the same name item is a file (without the trailing space), maybe it cannot process its children too.

lfcnassif commented 1 year ago

It seems it tries to copy the input stream of the folder. Maybe we have to skip the temp file creation if the file is a folder. I will test.

My fault, sorry.

I noticed you have put back previous API that uses Path NIO, so, it fail.

When an error occurs, since a new temp Path (File) without the invalid chars are being created, shouldn't it work?

  • In file visitor, when it fails because it understands that "file does not exist", I call visitFile with a empty BasicAttributes instance.

If it makes reading the file possible, great!

  • In the SearchApp, FileInputStreamFactory was failing because it did not understand the string that the app passed: "\?\C:\indice2..\teste\teste \proc.log". So I make a regex to remove all the ".." references.

I didn't test the UI yet (but planned to do it before merging), sorry again. So seems it was a good idea to put the special handling just if the InvalidPathException is throw, it would avoid regressions on files that used to be processed fine.

patrickdalla commented 1 year ago

I found 2 problems: 1) The java.io.File isDirectory method also misbehaves for folder named with trailing spaces, returning always false. 2) Files.walkFileTree method used in FolderTreeReader is depedent of many methods of different NIO classes.

I could overcome the 2 problems. For the second I had to create a set of NIO classes Wrappers to intercept the important methods Files.walkFileTree depends on, and overcome the problem.

It worked for both previous related cases, where there is a folder named with trailing space name, and, for one case, another folder with the same path and name without the trailing space, and for the other case, no such folder exists.

The only remaining problem is that I could not read the attributes of these trailing space named folders, letting them empty.

patrickdalla commented 1 year ago

It seems sleuthkit can be used to process live folders. Have you ever tested @lfcnassif ? I have tested tsk_load with a logical path, and it created a sleuth.db with all file and folder names (trailing space names included).

"Tools can be run on a live Windows or UNIX system during Incident Response. These tools will show files that have been "hidden" by rootkits and will not modify the A-Time of files that are viewed. (Sleuth Kit Informer #13)" taken from: https://www.sleuthkit.org/sleuthkit/desc.php

lfcnassif commented 1 year ago

It seems sleuthkit can be used to process live folders. Have you ever tested @lfcnassif ?

I think this is a very recent support (AFAIK from TSK-4.12) and I haven't tested it. If we can avoid adding an unneeded TSK dependency, I think it would be better. TSK FS transversal usually takes much longer than processing a mounted folder with our current code, not sure about its mounted folder transversal performance... It is important to be fast because many users use fastmode profile to triage data in crime scenes.

patrickdalla commented 1 year ago

As JVM implementations does not intend to be forensic, they depend and do make calls to Windows API methods, and the name limitations are on them (API). They were not implemented to support these kind of names, and these folder/files exists because of applications that do not comply with these SO rules (some maybe intentionally to hide some files).

patrickdalla commented 1 year ago

So, this two limitation still persists: 1) Files with trailing space names content aren't processed, though they are listed (without their NTFS attributes). 2) Folder with trailing space names are processed, although without their NTFS attributes.

It worked with these limitations, but some more test to check any regressions is welcome.