serge1 / ELFIO

ELFIO - ELF (Executable and Linkable Format) reader and producer implemented as a header only C++ library
http://serge1.github.io/ELFIO
MIT License
720 stars 155 forks source link

Note description contains garbage characters #78

Closed galjs closed 2 years ago

galjs commented 2 years ago

In get_note() in elfio_note.hpp a mathematical operation is done to calculate the location of the description buffer, but the resulting buffer contains only garbage characters (while readelf's output (-n flag`) provides sane output on the same file. This might indicate a problem with the calculation.

Also, since get_note() receives a void*& desc (instead of void*) to point to the note's description, the use of reinterpret_cast<void*>(some_string.data()) is prevented. A better API might be to just get desc as std::string& and encapsulate the use of void* inside get_note().

Last thing - the note's description is not used in dump::notes, which is a shame since it is already parsed from the note anyway.

serge1 commented 2 years ago

Hi, I believe that you are reffering GNU specific entries in .note.ABI-tag section. Those entries are indeed binary (as opposite to textual) and readelf implements special logic for interpreting them. I didn't find relyable documentation for implementing similar logic in the dumper.

serge1 commented 2 years ago

the note's description is not used in dump::notes, which is a shame since it is already parsed from the note anyway

As said above, the description may contain binary data. Is there a benefit to dump it?

A better API might be to just get desc as std::string& and encapsulate the use of void* inside get_note()

The library usually returns just a pointer to binary data and does not convert it to std::string object.

galjs commented 2 years ago

Hi, I believe that you are reffering GNU specific entries in .note.ABI-tag section. Those entries are indeed binary (as opposite to textual) and readelf implements special logic for interpreting them. I didn't find relyable documentation for implementing similar logic in the dumper.

I've found the relevant part of the documentation for parsing this binary data in the man pages of elf elf(5) under the paragraph titled Notes (Nhdr). Near the end of the paragraph there's an explanation of every description depending on the type of the note. This should be enough for implementing a translation mechanism.

The library usually returns just a pointer to binary data and does not convert it to std::string object.

Since this is binary data and not textual (like the note's name for example), I can understand why the use of std::string is not fitting here. But the use of void*& instead of void* prevents me from using std's containers when calling get_note(). Maybe the change should just be to void*

serge1 commented 2 years ago

This should be enough for implementing a translation mechanism.

I added it to my task list. Thank you.

But the use of void& instead of void prevents me from using std's containers when calling get_note()

Could you please explain this? How it is different from obtaining the description size Elf_Word& descSize?

galjs commented 2 years ago

Could you please explain this? How it is different from obtaining the description size Elf_Word& descSize?

Since you don't fill desc with data but rather just point it to an existing buffer (be cautious from lifetime issues), the use of the STL is not possible unless you change from void*& to std::string& or something similar. If you prefer to keep it general, a void* will be just as good. The & is redundant.

Be advised that this method is unorthodox since usually (see WINAPI and the linux api) its the user's responsibility to allocate a buffer and pass a pointer to it the the API. The API's job is to fill this buffer. That's what coders are used to and a best practice.

It's very easy to misuse get_note() since a regular user (that hasn't read the source code of get_note() will see a buffer is requested and will allocate one, freeing it at the end of the use. Since you re-point desc to a new location, no pointer points to the allocated buffer of the user, so it will never get freed.

Once you change your API to behave like I advised, the use of std::strings will be possible as shown in my original post.

Elf_Word& descSize is different since it's not a pointer, but a reference, so the location it points to cannot be changed. Then you assign value to descSize it copies the value into descSize buffer in the outer scope.

serge1 commented 2 years ago

I kind of disagree with you regarding the APIs. Just a quick search returns a lot of examples that return pointers to internal buffer. In opposite - it is usual technique to provide information without passing an ownership or copying data.

To name two examples - NetGetJoinInformation() in WinAPI ('lpNameBuffer' parameter) and gethostent_r() in socket API ('result' parameter).

If I understand you correctly, you propose to use either std::string& or void**. The last one is due to the pointer is a result value. As the convention is to use referencies for result values, there is no big difference between void** and void*&.

If you prefer to keep it general, a void* will be just as good. The & is redundant.

This statement is problematic! The parameter is a result value of type void*

galjs commented 2 years ago

In the end it's just a matter of consistency. Notice that in the same function you have std::string& name whose buffer is filled with the contents of the internal buffer, and void*& desc which is pointed to the internal buffer.

In my opinion, the first one is the better solution since it exports easy-to-use types (in this case std::string) instead of more lowlevel types such as void*, which depends more on the user not making a mistake and using the API inappropriately (allocating a buffer when it's not actually necessary, resulting in a buffer that's never freed).

The cost of the copy here is small and acceptable compared to the risks and the other benefits I outlined

serge1 commented 2 years ago

In the end it's just a matter of consistency.

I agree. The library uses std::string type for textual entities and void* for unspecified data. While it is possible to use typestd::string in the last case too, this is the convention.

galjs commented 2 years ago

I agree. The library uses std::string type for textual entities and void* for unspecified data. While it is possible to use typestd::string in the last case too, this is the convention.

Well, in that case I'd like to suggest a change in favor of STL's containers. The convention in the projects I take part in is to use std::vector<uint8_t> for unspecified binary data (for example when reading from a file). This allows easy access to the data after its acquisition and also ensures its safe release at the end.

This is a big refactor that breaks the API, but might be good in the long run. Maybe a good start will be to faze-out raw pointers in internal code in favor of STL's containers (strings, vectors and unique_ptrs)

serge1 commented 2 years ago

Sure, I can consider such change for the next major version of the library.

galjs commented 2 years ago

Sure, I can consider such change for the next major version of the library.

Looking forward to it.

So this leaves only the translation mechanism for the description of the notes and the addition of the description to the dump class once this translation is implemented.

serge1 commented 2 years ago

More detailed 'note' section description has been added in commit e451ea21. Please reopen the issue in case any problem is observed