sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
924 stars 217 forks source link

RCE vulnerability in libwebp dependency #1903

Closed lfcnassif closed 10 months ago

lfcnassif commented 11 months ago

libwebp is used by tesseract and imagemagick, we should upgrade libwebp to 1.3.2 version as described here: https://nvd.nist.gov/vuln/detail/CVE-2023-4863

@tc-wleite, as you already compiled tesseract from source, would it be easy to compile it again with libwebp-1.3.2?

About imagemagick, I already reported the dependency issue to them. If they are not fast, we may think about compiling it from source...

For users, to mitigate the problem within IPED for now, it is enough to disable OCR and set enableExternalConv = false in conf/ImageThumbsConfig.txt

wladimirleite commented 11 months ago

@tc-wleite, as you already compiled tesseract from source, would it be easy to compile it again with libwebp-1.3.2?

A few months ago I compiled it again (tesseract version 5.3.0) to check if there was any improvement comparing to the version we are using (5.0.0), but it wasn't the case. I can try build it again, using libwebp-1.3.2.

lfcnassif commented 11 months ago

I can try build it again, using libwebp-1.3.2.

Great, thank you!

wladimirleite commented 11 months ago

Just built Tesseract 5.3.2-24-g3922 (latest version), but it uses libwebp-1.3.1. I will try to manually change to 1.3.2.

lfcnassif commented 11 months ago

Just built Tesseract 5.3.2-24-g3922 (latest version), but it uses libwebp-1.3.1. I will try to manually change to 1.3.2.

There is no urgent need @tc-wleite! I just got an answer from ImageMagick project:

Version 7.1.1-17 of ImageMagick uses libwebp-1.3.2

So we could redirect webp to imagemagick before tesseract, as we do for other non standard formats.

wladimirleite commented 11 months ago

I finally managed to build tesseract 5.3.2 with libwebp-1.3.2. It was kind of painful as I am using a procedure that uses "sw", which is not updated to libwebp-1.3.2 (currenly https://software-network.org/org.sw.demo.webmproject.webp shows 1.3.1 as the most recent version).

Tests with a few samples are looking good. I will run a larger test overnight, to check the performance and recognition results. From what I remember from 5.3.0 (compared to 5.0.0), I expect only very minor differences.

lfcnassif commented 11 months ago

Thank you very much @tc-wleite! But as I said, don't hurry, we can use imagemagick as a workaround.

wladimirleite commented 11 months ago

Tesseract 5.3.2. compiled for Windows with libwebp 1.3.2: tesseract.zip

tesseract 5.3.2-24-g3922
 leptonica-1.83.1 (Sep 29 2023, 19:05:06) [MSC v.1929 LIB Release x64]
  libgif 5.2.1 : libjpeg 9e : libpng 1.6.40 : libtiff 4.5.1 : zlib 1.2.13 : libwebp 1.3.2 : libopenjp2 2.5.0

I processed a large set of images and PDFs (around 20K files in total), with the new version and the one we currently use (5.0.0). Performance was slightly better (ParsingTask total time was reduced by ~7% ) with the newer version. Results (extracted text) are similar, but there are small (and in a few cases not so small) differences. I wrote a quick program to compare OCR results of each item, calculating the Levenshtein distance (simplified to deal with longer strings). Then I visually inspect some of the images/PDFs with the highest distances. In most of them, the recognized text is similar, but how it dealt with the layout (e.g. two columns instead of one) changed. In general, the newer version seems slightly better.

lfcnassif commented 11 months ago

Awesome! Thank you @tc-wleite! I'll update tesseract and imagemagick, cherry pick other important fixes (like #1879) and try to release 4.1.5 early in the next week.

lfcnassif commented 10 months ago

Just started an ImageMagick regression test on 300K samples of non standard image formats collected from 220 different cases. Probably I'll post the results tomorrow.

lfcnassif commented 10 months ago

Images with generated thumbnails by current ImageMagick version: image

Images with generated thumbnails by ImageMagick version 7.1.1-18: image

So the upgrade resulted in more EMF, TIFF & XBM rendered images. I'll proceed with the upgrade.

PS: I didn't compare the rendered image quality or correctness, just if a thumbnail was generated or not.

lfcnassif commented 10 months ago

Hi @tc-wleite, I'm thinking to use ImageMagick dynamically instead of statically linked (maybe it runs faster), what do you think?

lfcnassif commented 10 months ago

Hi @tc-wleite, I'm thinking to use ImageMagick dynamically instead of statically linked (maybe it runs faster), what do you think?

I started a performance test. Unless there is an important difference, I'll keep the statically linked version, since all official IM portable versions are statically linked.

wladimirleite commented 10 months ago

I usually prefer static linked libraries. My intuition is that performance should be very similar in the case of ImageMagick, but it is better to test!

By the way, if you want to compare generated thumbnails from the test you made, between the newer IM version and the one currently used, not sure if you remember, but I wrote a small program that point out the hashes of the N images with "more different" thumbnails. So you can filter in both cases (if you still have the cases) and visually compare just a small subset, not thousands of images.

lfcnassif commented 10 months ago

Yes I remember, that would be great! What's the input, the IPED cases or the thumbs databases?

wladimirleite commented 10 months ago

Thumbs database.

wladimirleite commented 10 months ago

It needs SQLite JDBC jar. Cases path and number of top differences to be printed are hard coded in main(). The comparison between two image is very simple (just difference between RGB values). And it shows hashes present in the first case but not in the second only, so it is better to use the case with more thumbs as the first, or run the comparison twice inverting the order.

EDIT: Code was too long, it is better to attach it: CompareThumbs.zip

lfcnassif commented 10 months ago

It needs SQLite JDBC jar. Cases path and number of top differences to be printed are hard coded in main(). The comparison between two image is very simple (just difference between RGB values). And it shows hashes present in the first case but not in the second only, so it is better to use the case with more thumbs as the first, or run the comparison twice inverting the order.

EDIT: Code was too long, it is better to attach it: CompareThumbs.zip

Thank you @tc-wleite! Just did the comparison, differences are very minor, just one JP2 was rendered with different colors/brightness, but I think it is fine. And looking into the EMF number difference, it is due to timeouts, old ImageMagick is also able to render them in ImageViewer.

So I think we are fine and I will proceed with both upgrades.

lfcnassif commented 10 months ago

@tc-wleite, just realized mplayer may link to libwebp too... Do you know if it does? At least, we don't process webp using mplayer, just animated heic, heif, gif & png, right?

wladimirleite commented 10 months ago

@tc-wleite, just realized mplayer may link to libwebp too... Do you know if it does? At least, we don't process webp using mplayer, just animated heic, heif, gif & png, right?

I believe that FFmpeg (used by MPlayer) uses libwebp, but only to encode, not to decode. Decoding would be useful for us to process animated WEBPs, but currently there is no support (https://trac.ffmpeg.org/ticket/4907). So we definitely do not use anything related to WEBPs in MPlayer.

From time to time, I check new MPlayer versions for Windows. Unfortunately, one of the websites that used to publish these Windows builds has not been updated since 2019, and the other was usually updated very often (like once per month or more), but the last update was in December, 2022. So, I am not sure if it would be easy to find a recent build (that includes the most recent version of libwebp).

lfcnassif commented 10 months ago

Thank you @tc-wleite for your research! Since the issue is triggered by decoding a malicious webp and FFmpeg/Mplayer doesn't support it, I think we are safe.