java.lang.OutOfMemoryError

greg-michael commented 2 years ago

We are running log4j2-scan.exe v2.7.1 on our Windows servers using TrueSight Server Automation (TSSA) to deploy and execute the package. This is done through a mapped-user elevation to a local-administrator account and executed via command shell as that user.

On a particular server, we're seeing this error from the output of the execution of the scanner: Logpresso CVE-2021-44228 Vulnerability Scanner 2.7.1 (2022-01-02) (Time in agent's deploy log:: 01/17/2022 13:50:37) Scanning drives: C:\, M:\ (without P:, Z:) (Time in agent's deploy log:: 01/17/2022 13:50:38) Scanned 3191 directories and 27196 files Found 0 vulnerable files Found 0 potentially vulnerable files Found 0 mitigated files Completed in 10403.38 seconds Error: Garbage-collected heap size exceeded. java.lang.OutOfMemoryError: Garbage-collected heap size exceeded. "C:\temp\stage\b197902652953cc29ef9df4465ff0232\bldeploycmd-2.bat": Item 'Execute log4j2-scan.exe' returned exit code -1 (Time in agent's deploy log:: 01/17/2022 16:44:03) "C:\temp\stage\b197902652953cc29ef9df4465ff0232\bldeploycmd-2.bat": Command returned non-zero exit code: -1 (Time in agent's deploy log:: 01/17/2022 16:44:03)

The scanner is executed using this command string. Note that %RPTFILE% is defined prior to the execution of the scanner. log4j2-scan.exe --silent --scan-zip --scan-log4j1 --all-drives --report-path "%RPTFILE%" --report-dir "C:\Temp" --exclude "P:" --exclude "Z:" --exclude-fs afs,cifs,autofs,tmpfs,devtmpfs,fuse.sshfs,iso9660 2>&1

This scan seems to take an inordinately long amount of time to run because it is getting stuck trying to scan more than 6M machine key files from an application called PortalProtect by TrendMicro. The file names are lengthy GUID-type names, and each file is roughyl 1k in size. There's no specific type other than "System File" which is generic. The file names are not patterned in any easily distinguishable way as to provide an easy exclusion filter.

When I ran the scan manually without the --silent option and included the --trace and --debug options, it got as far as this directory on the C: drive, output one single line of status update after 10 seconds and then basically hung itself.

Edit: The process was slowly climbing up the ladder consuming all available memory. I had to kill it before it caused the server to run out of memory. It would appear that there is a need to include some regular flushing of memory to the log and then cycling through the next batch of X files and directories, especially when there are significant numbers of files to be processed. (i.e. - millions)

xeraph commented 2 years ago

@greg-michael Due to Java API limitation, scanner cannot handle directory which has millions of files.

https://github.com/logpresso/CVE-2021-44228-Scanner/blob/c74f0f37d41c39893b655a34eb4cd5eeda87cd6c/src/main/java/com/logpresso/scanner/Log4j2Scanner.java#L488-L494

As you see, [listFiles() API](https://docs.oracle.com/javase/7/docs/api/java/io/File.html#listFiles()) tries to return all files in a single directory. This is a single function call, so scanner cannot have any chance to free up memory.

Since I changed minimum JDK version from 6 to 7, Files.walkFileTree API might be alternative solution. Until that patch is available, there is one workaround: -f file_path_list.txt. Write all file path list into file_path_list.txt and pass it to -f option.

greg-michael commented 2 years ago

@greg-michael Due to Java API limitation, scanner cannot handle directory which has millions of files.

https://github.com/logpresso/CVE-2021-44228-Scanner/blob/c74f0f37d41c39893b655a34eb4cd5eeda87cd6c/src/main/java/com/logpresso/scanner/Log4j2Scanner.java#L488-L494

As you see, listFiles() API tries to return all files in a single directory. This is a single function call, so scanner cannot have any chance to free up memory.

Since I changed minimum JDK version from 6 to 7, Files.walkFileTree API might be alternative solution. Until that patch is available, there is one workaround: -f file_path_list.txt. Write all file path list into file_path_list.txt and pass it to -f option.

Understood. We'll have to look into whether we just exclude that path or accept that the scanner cannot scan this system in its current version due to limitations within the Java code. Please let me know when you have an updated version, and I'll test it.

Thanks!

xeraph commented 2 years ago

@greg-michael I will. Thank you for your understanding :D

greg-michael commented 2 years ago

@xeraph Any updates on this?

xeraph commented 2 years ago

@greg-michael Would you try v2.9.0 release? https://github.com/logpresso/CVE-2021-44228-Scanner/releases/tag/v2.9.0

greg-michael commented 2 years ago

@greg-michael Would you try v2.9.0 release? https://github.com/logpresso/CVE-2021-44228-Scanner/releases/tag/v2.9.0

The new release seems to have helped for a number of systems. With the -Xmx option, what is the implication of setting a value for example -Xmx1000M? Would this mean that any files larger than 1GB would be skipped? Any directories/paths with sufficient number of files to exceed 1GB heap space would cause the scanner to abort? I'm trying to fine tune the scan executions in my Windows environment so that I can get a number of servers that are not finishing within a 5 hour window to be able to complete their scans. FWIW - the only servers that I'm still seeing occasional Out of Memory errors on are our Outlook Web Access servers. Those have to be scanned with -Xmx1000M or lower to be successfully scanned.

Thanks!

xeraph commented 2 years ago

@greg-michael Wow. that's a good news. Xmx switch is supported by JVM or substrateVM, and it just set maximum available memory limit for java process. If scanner cannot allocate more memory above specified memory limit, it fails with OOM as usual. Therefore, if scanner completed scan without any error with Xmx1000M option, it means that scanner successfully scanned all files.

By the way, which one did you used? JAR version or native binary?

greg-michael commented 2 years ago

I am using the binary with the -Xmx switch. There are still a small number of servers that continue to crash with OOM errors, but fewer than before. In particular, Outlook Web Access servers.

How does the use of the -Xmx option affect scanned file sizes? Does it impact them at all? I do have some systems with very large archive files and directories of archives, and I want to ensure that they are still being scanned properly.

xeraph commented 2 years ago

@greg-michael Then try JAR version with Xmx switch for Outlook Web Access server.

Xmx limits heap size. In most cases, scanner can decompress part of large archive file in the limited heap size. If scanner exits normally, you can sure they are scanned properly. If you doubt it, test with large ZIP file which embedding vulnerable log4j file (scan zip file with --scan-zip option). You can specify target file path instead of directory path.

greg-michael commented 2 years ago

Closing issue since the -Xmx switch seems to have helped for the majority of servers experiencing OOM issues during scans.

logpresso / CVE-2021-44228-Scanner

java.lang.OutOfMemoryError #253