villano-lab / nrCascadeSim

calculating the NR spectrum resulting from neutron-capture cascades.
MIT License
0 stars 1 forks source link

[JOSS Review]: Unsafe memory access #73

Closed altavir closed 2 years ago

altavir commented 2 years ago

There are several places, where unsafe memory access (malloc/free) is used. At least in one of these places there is an obvious memory leak. This allocated memory is never being freed:

https://github.com/villano-lab/nrCascadeSim/blob/6b11bb3601109b8b0177e4ad8e40b69ef613a933/src/lindhard.c#L54

It would be good to have a some kind of justification of using unsafe memory access or at least a code sanitizer report.

villaa commented 2 years ago

@altavir do you mean it's not being freed inside the function? That's by design. This API function is returning an array (that's the allocated memory) and it's the responsibility of the user to free that memory. It is known how big the array is to the user.

Do you consider that bad practice?--I will look through some standards. But many C codes that return arrays work like that one way or another.

The only other option I see is to use STL vectors but that adds a bit of overhead and would need to be consistently done through the whole code base.

altavir commented 2 years ago

I specifically checked. It is not freed outside as well in this case.

If you mean a performance overhead, you should supplement your conciderations with some performance measurements. I am pretty sure, that you can't see it. Especially when they are used in configurations, not in computations.

Memory leaks is one of the main problems in C++ ecosystem. Direct memory allocation should be avoided at all costs. Vector is a good solution in your case. In other cases smart pointers could help. I am not sure which C++ standard are you requiring, but all those things should be available since C++11.

In pure C you do not have a lot of freedom in that, but your code is C++, not C.

altavir commented 2 years ago

Just in case, here is a quote from the basic C++ reference manual:

This function does not call constructors or initialize memory in any way. There are no ready-to-use smart pointers that could guarantee that the matching deallocation function is called. The preferred method of memory allocation in C++ is using RAII-ready functions std::make_unique, std::make_shared, container constructors, etc, and, in low-level library code, new-expression.

altavir commented 2 years ago

My collegue suggested to use structs for that. They are stack-allocated so no problems with memory leaks. And you get additional type safety.

villaa commented 2 years ago

My collegue suggested to use structs for that. They are stack-allocated so no problems with memory leaks. And you get additional type safety.

Just to be clear--do you mean create/return a struct instead of an array?

altavir commented 2 years ago

Yes. It should work. I personally would use a vector. But if you are concerned about performance, it would be even faster than malloc and would save you a memory leak.

Though in general any statements about performance are meaningless without actual measurements (I've been doing a lot of high performance computations, so I am talking from experience).

Shimuuar commented 2 years ago

I'm colleague @altavir mentioned. I think programmer should first of all care about meaning of code. It seems that elements of array has different meaning and putting them into plain array whether double[] or vector<double> hides that. Programmer must know what lies in first and second element of array and it looks like any other array. Putting it into struct allows to document meaning of fields and for reader to look up that documentation.

Speaking of performance. I measured overhead of returning 16B struct struct Foo {double fld1, fld2}; by value and allocating 2-element array on heap. Difference is drastic: returning by value is just a few instructions: it's difficult to see because of PAPI overhead, while allocation of new buffer costs about 220 instructions and freeing another 100. For such small structures it makes sense to use plain structs. It also saves you pain of memory management.

Also working with raw pointer is just asking for troubles. RAII is one of few nice feature of C++. When used properly it allows to make compiler do most of memory management for you. And compiler doesn't forget to call free

As a aside. No need to create temporary buffers. You could just take address of variable and call lindhard_ar_k(&E0,&k) (Or just switch function to call by value): https://github.com/villano-lab/nrCascadeSim/blob/182e8a56406a8ae4e5cfc8df489d9779d25d93da/src/lindhard.c#L151-L158

villaa commented 2 years ago

@gerudo7 as part of this issue let's also change the extensions .c to .cpp and correspondingly in the Makefiles. As pointed out this is really a C++ code so this will make that explicit.

see this confusing post for the reasons behind that.

basically we're not using a Windows compiler (only in a linux env through WSL) so it doesn't really matter for us, but .cpp makes it clear that we're coding in C++.

Also, we may have said C/C++ in documentation, we can just change that to C++.

altavir commented 2 years ago

You also need to clarify what C++ standard you are using. #64 would help with that as well.

villaa commented 2 years ago

You also need to clarify what C++ standard you are using. #64 would help with that as well.

@altavir this information is in the documentation, should it be somewhere else? Also we are consistent with a number of standards starting at C++11 and I am not aware of any deprecated features used through at least C++17.

altavir commented 2 years ago

Ah, I missed it. And from the rest of documentation I infer, you use only gcc. Different versions of C++ compiler are not fully compatible. It is one of major problem of using C++ (and why I advocate not to use it at all without very compelling reasoning). CMake have directives to check a compiler so it should not be a problem with proper build.