solita / clamav-java

Simple ClamAV client for streaming data to clamd server
GNU Lesser General Public License v2.1
106 stars 46 forks source link

Large files will fail #5

Closed drogin closed 8 years ago

drogin commented 8 years ago

Although there is a comment about large files in the code, the documentation and API / Javadoc expose very little information about this scenario.

For example, if the default StreamMaxLength for clamd is 25MB, and we try to scan 30MB worth of data, the library will throw a SocketException to the user(Which won't tell him much what went wrong). This is happening even if we set CHUNK_SIZE to be less than 25MB in the clamav-java lib.

I suggest changing the scan-function(and javadoc / api) to catch the server-reply when a file is too large, and wrap the SocketException in a custom FileTooBigException, and declare the API to throw that exception. Since this is something a user should definitely handle / be aware of, I also suggest making the exception a Checked exception.

lokori commented 8 years ago

If I understood correctly, the only thing the library could do is to detect the situation and throw an exception. The CHUNK_SIZE in the library just controls the transfer to clamd to limit the amount of memory used by the library, but the overall size must still remain below clamd's limit, which the library doesn't know.

Presumably the user of the library knows the limit in his/her environment, but the lib could notice the reply from the server and throw an exception.

drogin commented 8 years ago

Correct. I couldn't find any better solution either... like you say, to notice the reply from the server and throw an exception to the user.

lokori commented 8 years ago

I wrote the code to handle this. The remaining problem is having a test for it.

lokori commented 8 years ago

There. https://github.com/solita/clamav-java/pull/8

I didn't want checked exceptions. Originally I chose to not use exceptions for the scan result as it is a matter of context whether a virus in the file is "exception" or not.

This case is different and exceeding the size limit should not happen, so it's an exception condition, but using a checked exception would break all code using this library. Therefore a RuntimeException instead so that upgrading to a new version doesn't require changes in the user's code.

lokori commented 8 years ago

The code is still somewhat unsatisfactory as it continues to write even after the size limit has been exceeded during the scan. It could read the reply as it becomes available, but testing reveals this is ok with clamd and the code is simpler this way.

lokori commented 8 years ago

It appears I was wrong. The test doesn't always pass, but it happens to work almost always because network latency to localhost is pretty much non-existent.