Potential Buffer-Overrun bugs in usage of MultiByteToWideChar and WideCharToMultiByte

GoogleCodeExporter commented 9 years ago

In HEAD, it looks like there are 28 usages of MultiByteToWideChar and 
WideCharToMultiByte. None of the ones I looked at checked that the destination 
buffer is null-terminated after the function runs. This can indirectly lead to 
buffer overruns.

Here is why:

It is true that these functions themselves guarantee that they will not overrun 
destination buffer. However, if what they try to write is greater than or equal 
to the length of the destination buffer, then there will be no null-terminator 
at the end of the destination buffer.

For most string-related functions, one can prevent this from happening by 
simply allocate a destination buffer of size srcBufferLen+1. However, when 
using functions that change the character encoding, the sizes (in bytes) of 
strings can change in difficult to predict ways. For example, there exists a 
UTF16 string which is 6 characters long but becomes 18 characters long when 
encoded in UTF8. Thus, the technique of allocating a buffer one larger than the 
source string length will fail for certain strings. See 
http://stackoverflow.com/a/216879 for more details about this behavior.

Non-null terminated strings (that are expected to be null-terminated) are 
problematic because they can result in buffer-overruns in later function calls.

Many usages of MultiByteToWideChar and WideCharToMultiByte in this project's 
code appear to have this issue. For example, take lines 1465-1471 of 
demuxer/LAVSplitter/LAVSplitter.cpp in the current revision (revision id 
703b7dce84a52a9f6064769bafe5d67d0bb26607):

// Convert to multi-byte ascii
size_t bufSize = sizeof(WCHAR) * (m_settings.subtitleAdvanced.length() + 1);
char *buffer = (char *)CoTaskMemAlloc(bufSize);
if (!buffer)
  return selectorList;
ZeroMemory(buffer, bufSize);
WideCharToMultiByte(CP_UTF8, 0, m_settings.subtitleAdvanced.c_str(), -1, 
buffer, (int)bufSize, nullptr, nullptr);

Although the size of the destination buffer is set to twice the number of 
source characters plus two, this is not always sufficient to hold the entirety 
of the text. As such, you may wind up with buffer lacking a null terminator.

As another example, look at lines 449-455 in 
demuxer/Demuxers/LAVFStreamInfo.cpp in the current revision (revision id 
04c9a2c101efeadf57ae5d81b0e07b4ba65b96cb):

if (AVDictionaryEntry *dictEntry = av_dict_get(avstream->metadata, "title", 
nullptr, 0))
{
  // read metadata
  char *title = dictEntry->value;
  // convert to wchar
  MultiByteToWideChar(CP_UTF8, 0, title, -1, subInfo->TrackName, 256);
}

As far as I'm aware, there's nothing stopping a subtitle track from having a 
title longer than 256 bytes. (FFmpeg's source code comments and documentation 
don't make any such promises.) As such, I believe this code can result in the 
title string lacking a null terminator.

There are several ways one can go about fixing this. I would recommend writing 
a wrapper function to use that ensures the buffer is null terminated. For 
example, you might want to do something like buf[sizeof(buf)-1]=0.

See 
http://blogs.msdn.com/b/esiu/archive/2008/11/06/in-security-of-multibytetowidech
ar-and-widechartomultibyte-part-1.aspx, 
http://blogs.msdn.com/b/esiu/archive/2008/11/14/in-security-of-multibytetowidech
ar-and-widechartomultibyte-part-2.aspx, and 
http://msdn.microsoft.com/en-us/library/windows/desktop/dd374130(v=vs.85).aspx 
for more information about these and other issues with MultiByteToWideChar and 
WideCharToMultiByte.

Original issue reported on code.google.com by johnp...@gmail.com on 10 May 2014 at 11:36

GoogleCodeExporter commented 9 years ago

This issue was closed by revision c9b42a87ec01.

Original comment by h.lepp...@gmail.com on 12 May 2014 at 2:31

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

I added a few wrappers that automatically calculate the required memory and/or 
ensure there is a null terminator.

Note that one part of your description is not accurate, it will *only* not 
write a NULL if the string length matches the buffer size exactly. If the 
length is longer, it'll error instead. So the chance of this happening is not 
as dramatic - unless someone targets it specifically, of course.

Original comment by h.lepp...@gmail.com on 12 May 2014 at 2:33

GoogleCodeExporter commented 9 years ago

I looked at the safeXX functions you added, and I think that they can still 
yield non-null terminated strings. In particular, the following example code 
gives a non-null terminated string even when using the SafeXX functions you 
created.

char* str = "asajsfdfja9dsfja9sdfa0s9fdjas0fdajds0fjlkjdsaf";
wchar_t wstr[4];
SafeMultiByteToWideChar(CP_UTF8, 0, str, -1, wstr, 2);
printf("%u %u %u %u", wstr[0], wstr[1], wstr[2], wstr[3]);
// prints 97 115 52428 52428, note that the last character is not 0

Also, in response to your comment:

> Note that one part of your description is not accurate, it will *only* not 
write a
> NULL if the string length matches the buffer size exactly. If the length is 
longer,
> it'll error instead. So the chance of this happening is not as dramatic - 
unless
> someone targets it specifically, of course.

I believe that what I said was actually correct. In particular, if it were the 
case that having a source much longer than the destination results in a 
null-terminated string, then the example code above would print 97 115 52428 0 
instead of 97 115 52428 52428.

Why not just replace the usages of these functions with the standard C++ string 
conversion utilities? They're safe, and a bit cleaner too.

Original comment by johnp...@gmail.com on 13 May 2014 at 7:15

GoogleCodeExporter commented 9 years ago

Well, technically we're both correct. If the input string is far too long, 
it'll not write a null terminator, but it'll also return an error code. An 
alternative would be that it didn't modify the provided buffer at all (like 
many other functions would in an error case), and you still end up with random 
non-terminated data (your uninitialized buffer).

In short, one should check the return value for errors! ;)

I've added an additional check in the function which should ensure null 
termination now.

PS:
I don't like the C++ functions, they look ugly and clunky.

Original comment by h.lepp...@gmail.com on 17 May 2014 at 1:09

jamaal81 / lavfilters

Potential Buffer-Overrun bugs in usage of MultiByteToWideChar and WideCharToMultiByte #452