jaeandersson / swig

SWIG is a software development tool that connects programs written in C and C++ with a variety of high-level programming languages.
http://www.swig.org
Other
23 stars 19 forks source link

Dynamically allocate char buffers when converting from Matlab strings #89

Open VladimirIvan opened 7 years ago

VladimirIvan commented 7 years ago

This PR fixes issue #88

Alzathar commented 7 years ago

@VladimirIvan @jaeandersson What do you think to have a test that check this improvement? A new test in swig/Examples/test-suite/matlab/ for example?

VladimirIvan commented 7 years ago

I'm not familiar with this test setup but I added code which should (to my best knowledge) test this feature.

jaeandersson commented 7 years ago

There are several issues with this pull request:

Alzathar commented 7 years ago

@jaeandersson To answer to your concerns:

The code you wrote is C++. The MATLAB module should be valid C and valid C++

You are right, the function malloc/free should be used instead

Adding dynamic memory allocation like that can slow down the code a lot.

That's true too. We could have a static array (as in the current code) and in case the string is greater than 255 characters, we could use a dynamic array.

It looks like a memory leak to me, you dynamically allocate arrays of char, but when are they deallocated?

The lines 238 and 245 in the new code (i.e. delete[] buf;) manage the deletion of the allocated memory. I do not see a possible memory leak in the code.

VladimirIvan commented 7 years ago

The code you wrote is C++. The MATLAB module should be valid C and valid C++

You are right, the function malloc/free should be used instead

std::string is a c++ class. I don't see how it can be part of a valid C module if it has to depend on a c++ code in the first place. Malloc could be used for consistency but it would be violating the c++ guidelines.

Adding dynamic memory allocation like that can slow down the code a lot.

That's true too. We could have a static array (as in the current code) and in case the string is greater than 255 characters, we could use a dynamic array.

That can be done as a separate pull request to optimize this behaviour if you believe it will improve performance.

I wouldn't suggest it without profiling though. Adding a condition may also slow down the evaluation but with compiler optimizations on, chances are none of these would matter anyway. Besides, Matlab code calling any function wrapped like this will most probably be the bottleneck, not the memory allocation of a string. If this makes any difference to anyone's code they probably shouldn't be writing their code in Matlab in the first place.

Alzathar commented 7 years ago

std::string is a c++ class. I don't see how it can be part of a valid C module if it has to depend on a c++ code in the first place. Malloc could be used for consistency but it would be violating the c++ guidelines.

I do not see the use of the std::string in your PR. What I meant in my comment is that we should use some code like the following snippet to be usable in C and C++ code.

char* buf = (char*)malloc(sizeof(char) * len)
// Copy from Matlab mxArray structure.
free(buf)
VladimirIvan commented 7 years ago

My bad. I didn't fully read the context in which the function is implemented. I have pushed the changes you suggested.

jaeandersson commented 7 years ago

There is still a memory leak. If "alloc" is false, where will the buffer be deallocated?

Alzathar commented 7 years ago

I see the problem. In case alloc is NULL and not cptr, then we have a memory leak because there is no way to tell to SWIG that a new buffer was created. However, this case might not be of interest for the Matlab bindings (and might never happen). I looked for the usages of SWIG_AsCharPtrAndSize() in the sources and it is only used in two ways:

In the Python 3 bindings, the conversion will fail in case alloc is NULL and not cptr.

I propose to redesign the code as following:

/* WARNING not tested */
size_t len=mxGetM(pm) * mxGetN(pm) + 1;
if (cptr) {
  if (alloc) {
    *cptr = %new_array(char,len);
    int flag = mxGetString(pm,*cptr,len);
    if (flag) {
      %delete_array(*cptr);
      return SWIG_TypeError;
    }
    *alloc = SWIG_NEWOBJ;
  } else {
    /* We can't allow converting without allocation, since we cannot access directly to the internal representation of the data in Matlab. */
    return SWIG_RuntimeError;
  }
}
if (psize) *psize = len;
return SWIG_OK;

This code also also reduces the number of memory allocation needed compared to the current PR.