sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
924 stars 217 forks source link

NullPointerException in LanguageDetectTask #1947

Closed wladimirleite closed 10 months ago

wladimirleite commented 10 months ago

This was reported to another user who was processing a folder with a lot of 7Z, ZIP, TAR and ISO files, using 4.1.5. Processing only the 7z that caused the error I was able to reproduce it (using master branch).

2023-10-25 16:35:25     [ERROR] [app.processing.Main]                   Processing Error:
java.lang.Exception: Worker-11 Error while processing f2653796544.7z>>XXX/YYY.jpg (147456bytes)
        at iped.engine.core.Worker.process(Worker.java:186) ~[iped-engine-4.2-snapshot.jar:?]
        at iped.engine.core.Worker.run(Worker.java:265) ~[iped-engine-4.2-snapshot.jar:?]
Caused by: java.lang.NullPointerException
        at iped.engine.task.LanguageDetectTask.process(LanguageDetectTask.java:77) ~[iped-engine-4.2-snapshot.jar:?]

Line 77 of LanguageDetectTask is if (evidence.getMediaType().equals(MediaType.OCTET_STREAM)), and evidence.getMediaType() is null, which caused the exception.

@lfcnassif, this seems similar to the issue I reported last night. Obviously we could check if evidence.getMediaType() is not null at this point of the code, or invert the equals order, but I see that there are many other places in the code with a similar structure evidence.getMediaType().someMethod(). So, was anything changed recently (4.1.5, maybe before) that may cause an item with null mediaType? Or having null mediaType is expected?

Would it be a good idea to change Item.getMediaType() to return some default mediaType when mediaType is null? Or could it cause bad side effects?

lfcnassif commented 10 months ago

I think it is not expected. After SignatureTask, all items are expected to have a mediaType.

lfcnassif commented 10 months ago

So, was anything changed recently (4.1.5, maybe before) that may cause an item with null mediaType?

AFAIK nothing related to this was changed recently.

Would it be a good idea to change Item.getMediaType() to return some default mediaType when mediaType is null? Or could it cause bad side effects?

I think it shouldn't return a default value, explicitly aborting bugs like this can help us to fix them in the right place.

lfcnassif commented 10 months ago

At SignatureTask.java:122 the mediaType should be set for all items...

wladimirleite commented 10 months ago

Thanks @lfcnassif! I will take a closer look and see if I can identify the root cause.

wladimirleite commented 10 months ago

At SignatureTask.java:122 the mediaType should be set for all items...

The 7z was partially overwritten, so although the file extension is ".jpg", signature detected is "application/xhtml+xml", and correctly set in the line you mentioned.

Later, in the following part of the code of ParsingTask, parsedMediaType is just "application/", making MediaType.parse(parsedMediaType) to return null.

        String prevMediaType = evidence.getMediaType().toString();
        String parsedMediaType = metadata.get(StandardParser.INDEXER_CONTENT_TYPE);
        if (!prevMediaType.equals(parsedMediaType)) {
            evidence.setMediaType(MediaType.parse(parsedMediaType));
        }

I am going to submit a PR with a simple check to avoid such situation.