jacklicn / leptonica

Automatically exported from code.google.com/p/leptonica
0 stars 0 forks source link

getFilenamesInDirectory() in sarray.c doesn't correctly handle dirname with wrong directory sepchar under Windows #18

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Try running psioseg_reg using Windows. It calls
convertSegmentedPagesToPS("/tmp/junkimagedir", "/tmp/junkmaskdir", 2.0,
0.15, 190, 0, 0, "junkfile.ps") which will eventually fail.

What is the expected output? What do you see instead?
I expect to be get all the filenames in a directory.
Instead no filenames are returned (and this causes a crash later on).

What version of the product are you using? On what operating system?
leptonlib-1.63.
Microsoft Visual Studio 2008 SP1 with latest updates also applied.
Windows XP Pro SP3.

Please provide any additional information below.
Change the beginning of getFilenamesInDirectory() to:

getFilenamesInDirectory(const char  *dirname)
{
SARRAY           *safiles;
WIN32_FIND_DATAA  ffd;
size_t            length_of_path;
CHAR              szDir[MAX_PATH];  /* MAX_PATH is defined in stdlib.h */
HANDLE            hFind = INVALID_HANDLE_VALUE;
CHAR         *tmpDirname;

    PROCNAME("getFilenamesInDirectory");

    if (!dirname)
        return (SARRAY *)ERROR_PTR("dirname not defined", procName, NULL);

    length_of_path = strlen(dirname);
    if (length_of_path > (MAX_PATH - 2))
        return (SARRAY *)ERROR_PTR("dirname is to long", procName, NULL);

    if (stringFindSubstr(dirname, "/", NULL) > 0) {
        tmpDirname = stringReplaceEachSubstr(dirname, "/", "\\", NULL);
    strncpy(szDir, tmpDirname, MAX_PATH);
    FREE(tmpDirname);
    } else {
    strncpy(szDir, dirname, MAX_PATH);
    }
    szDir[MAX_PATH - 1] = '\0';

Original issue reported on code.google.com by tomp2...@gmail.com on 1 Dec 2009 at 8:19

GoogleCodeExporter commented 9 years ago
Thanks.  The changes are made and the file is attached.

I have a couple of questions.  First, we're using a MAX_PATH here,
with a comment that it's defined in stdlib.h.  But it's not in
the unix stdlib.h.  And furthermore, in writefile.c (also attached) we
have set it to 260 because otherwise it brings in a lot of cruft.

Also, I don't understand why the buffer seems to be padded out to the
end with \*.  Does that even make sense?

Original comment by dan.bloo...@gmail.com on 7 Dec 2009 at 2:35

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
adding sarray.c

Original comment by dan.bloo...@gmail.com on 7 Dec 2009 at 2:37

Attachments:

GoogleCodeExporter commented 9 years ago
My original comment said:

//#include <WinDef.h>       // for MAX_PATH (YUCKS, includes all sorts of other 
files)
#define MAX_PATH          260

I don't know where the reference to stdlib.h came from. I thought I just copied 
that
from YOU somewhere :)

Which buffer are you talking about? strncpy is supposed to pad out szDir with 
null up
to MAX_PATH.

Original comment by tomp2...@gmail.com on 7 Dec 2009 at 4:05

GoogleCodeExporter commented 9 years ago
Yes, a summer intern put in that windows patch, and I never looked closely at it
until today.

So I take it we should '#define MAX_PATH  260' here in sarray.c as well.  (Your 
patch
was in writefile.c).

Please ignore the comment about "padding".

Original comment by dan.bloo...@gmail.com on 7 Dec 2009 at 5:23

GoogleCodeExporter commented 9 years ago
Oh, aha, you mean:

   strncat(szDir, TEXT("\\*"), MAX_PATH - strlen(szDir));

This concatenates "\*" onto the end of szDir, so we get something like:

  dirname\*

Which FindFirstFileA can interpret as all the files in dirname.

In looking this change over, seems like I just took what you had before and 
quickly
made the minimum changes. That's where the erroneous comment about where 
MAX_PATH is
defined came from.

Since you need to use things like:

 HANDLE
 INVALID_HANDLE_VALUE (you can't change this to invalidHandle :P )
 WIN32_FIND_DATAA

you have to include <windows.h> which includes everything. Eventually MAX_PATH 
will
be defined when WinDef.h is included.

Since you don't use TCHAR, you don't need to include <tchar.h>

Given what Microsoft has to say about strncpy & strncat

 http://msdn.microsoft.com/en-us/library/xdsywd25(VS.80).aspx
 http://msdn.microsoft.com/en-us/library/tbyd7s1y(VS.80).aspx

We should use strncpy_s & strncat_s instead:

 http://msdn.microsoft.com/en-us/library/5dae5d43(VS.80).aspx
 http://msdn.microsoft.com/en-us/library/w6w3kbaf(VS.80).aspx

(This kind of stuff is why I like programming in Python or C# much better :P )

Here's what the function should look like 

#else  /* COMPILER_MSVC */
/* "Listing the Files in a Directory"
   http://msdn2.microsoft.com/en-us/library/aa365200(VS.85).aspx */
#include <windows.h>
SARRAY *
getFilenamesInDirectory(const char  *dirname)
{
    char              szDir[MAX_PATH];
    char             *tempname;
    l_int32           dirlen;
    HANDLE            hFind = INVALID_HANDLE_VALUE;
    SARRAY           *safiles;
    WIN32_FIND_DATAA  ffd;

    PROCNAME("getFilenamesInDirectory");

    if (!dirname)
        return (SARRAY *)ERROR_PTR("dirname not defined", procName, NULL);

    dirlen = strlen(dirname);
    if (dirlen > (MAX_PATH - 2))
        return (SARRAY *)ERROR_PTR("dirname is too long", procName, NULL);

    if (stringFindSubstr(dirname, "/", NULL) > 0) {
        tempname = stringReplaceEachSubstr(dirname, "/", "\\", NULL);
        strncpy_s(szDir, sizeof(szDir), tempname, _TRUNCATE);
        FREE(tempname);
    }
    else
        strncpy_s(szDir, sizeof(szDir), dirname, _TRUNCATE);
    strncat_s(szDir, sizeof(szDir), TEXT("\\*"), MAX_PATH - strlen(szDir)-1);

    if ((safiles = sarrayCreate(0)) == NULL)
        return (SARRAY *)ERROR_PTR("safiles not made", procName, NULL);
    hFind = FindFirstFileA(szDir, &ffd);
    if (INVALID_HANDLE_VALUE == hFind) {
        sarrayDestroy(&safiles);
        return (SARRAY *)ERROR_PTR("hFind not opened", procName, NULL);
    }

    while (FindNextFileA(hFind, &ffd) != 0) {
        if (ffd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)  /* skip dirs */
            continue;
        sarrayAddString(safiles, ffd.cFileName, L_COPY);
    }

    FindClose(hFind);
    return safiles;
}

#endif  /* COMPILER_MSVC */

Original comment by tomp2...@gmail.com on 7 Dec 2009 at 5:55

GoogleCodeExporter commented 9 years ago
Bleh. In looking that over I bet:

 if (dirlen > (MAX_PATH - 2))

should be:

 if (dirlen > (MAX_PATH - 3))

Since we are about to add two characters (and still need room for the trailing 
null).

Again, c programming gives me a headache :P

Original comment by tomp2...@gmail.com on 7 Dec 2009 at 6:02

GoogleCodeExporter commented 9 years ago
A real headache!

OK, I believe it's correct.  File attached in case you want to try compiling ...

Original comment by dan.bloo...@gmail.com on 7 Dec 2009 at 6:29

Attachments:

GoogleCodeExporter commented 9 years ago
fixed in 1.64 (AFAIK)

Original comment by dan.bloo...@gmail.com on 3 Jan 2010 at 11:40