google / oss-fuzz

OSS-Fuzz - continuous fuzzing for open source software.
https://google.github.io/oss-fuzz
Apache License 2.0
10.3k stars 2.19k forks source link

Getc/getc_unlocked are claimed to return uninitialized data from setvbuf private buffer #7071

Open bobfriesenhahn opened 2 years ago

bobfriesenhahn commented 2 years ago

Recently there have been a rash of uninitialized data oss-fuzz reports pertaining to GraphicsMagick which appear to be attributed to glibc stdio's privately allocated vbuf (allocated via setvbuf()). The origin of the uninitialized data is:

  #0 0x4d7f9d in __interceptor_malloc /src/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:911:3
  #1 0x7fdb77306e83 in _IO_file_doallocate /build/glibc-eX1tMB/glibc-2.31/libio/filedoalloc.c:101:7

Typical examples of this is issue are 39108, 39114, 41265, 42319, 42813, and 43018.

While there may be stdio calls which produce this issue, one common origin is getc or getc_unlocked, which by definition should return EOF or an initialized value.

Testing under Ubuntu 20.04 has yet to detect a problem (using ASAN or valgrind).

The code in question has been exercised by oss-fuzz for years without a problem but suddenly this issue has emerged. This seems like either a new issue with the glibc implementation or oss-fuzz itself since I have been unable to find any other cause.

I could use some assistance to resolve this.

Fojtik commented 2 years ago

No binary blob available to reproduce a problem.

bobfriesenhahn commented 2 years ago

It seems that there are perhaps vacancies in the oss-fuzz team which need to be filled since bugs remain apparently unread.

If there is something wrong with the libc which is built independently of our project builds, then someone should investigate it and fix it.

The only way that data from stdio's private vbuf allocation might possible be used is either that there is a bug in libc, there is an issue with the compilation of libc code, or (extremely unlikely) the application somehow reached out repeatedly and happened to read bytes from this small heap buffer without using the libc APIs.

Having unreproducible issues hanging over the head of the project that I have put much of 25 years into is frustrating.

DavidKorczynski commented 2 years ago

Apologies for the delay @bobfriesenhahn

I am unsure about this issue. Msan has had issues on some projects since the upgrade to Ubuntu 20.04 AFAIK, @jonathanmetzman could you confirm this?

Does this issue only happen to a particular fuzz engine? The fuzz engines instrument the code under analysis, and I wonder if there could be something in the instrumentation that causes false positives for Msan. There has been false positives with other projects due to this.

I would probably lean to the first option, of there being issues with Msan, but am quite uncertain in this case.

DavidKorczynski commented 2 years ago

Apologies for the delay @bobfriesenhahn

I am unsure about this issue. Msan has had issues since the upgrade to Ubuntu 20.04 AFAIK, @jonathanmetzman could you confirm this?

Does this issue only happen to a particular fuzz engine? The fuzz engines instrument the code under analysis, and I wonder if there could be something in the instrumentation that causes false positives for Msan. There has been false positives with other projects due to this.

I would probably lean to the first option, of there being issues with Msan, but am quite uncertain in this case.

oliverchang commented 2 years ago

Sorry for the delay @bobfriesenhahn!! Our replies were slow due to the holiday period and we dropped the ball on this one.

@morehouse @hctim Does this look like we need to add an MSan interceptor for getc/fgetc?

morehouse commented 2 years ago

Yes, looks like we're missing an interceptor for getc.