Library: patch possible misuse of find_library()

On Linux, the Library.load_default() function fails to properly resolve the JLink shared library if not installed in /opt/SEGGER/JLink, regardless of the value of LD_LIBRARY_PATH.

This is possibly caused by an improper use of the ctypes find_library() API, which:

on all platforms expects as parameter the library name without any prefix like lib, suffix like .so, .dylib or version number
returns the full file path on MacOS and Windows, but only the file name (so name) on Linux

Thus:

On Windows: find_library('JLinkARM') or find_library('JLink_x64') should indeed succeed, and actually return the expected full file path
On MacOS: although find_library() would also return the full file path, calling find_library('libjlinkarm') will fail
On Linux, find_library('libjlinkarm') will also fail, and would not have returned the full file path anyway

Changes introduced by this patch:

On Windows find_library(self._sdk) already succeeds, and returns the full file path: this patch does not introduce any change here
On MacOS, find_library() will use 'jlinkarm' (and not 'libjlinkarm') as parameter, which should succeed, and return the full file path
On Linux find_library('jlinkarm') will return the resolved soname, for e.g. 'libjlinkarm.so.7', and we'll use the native dlinfo() API to retrieve the full file path

If I'm correct, this should not break anything on Windows, and improve things on both Linux and MacOS.

All committers have signed the CLA.

I'm not opposed to this change, but I'm a bit confused on what the DLLInfo class is solving? Is it the name being prefixed by lib? In my testing, I think I've found that ctypes knows how to handle the leading lib, but I may be mistaken. Outside of that, it seems like we could address the Linux-specific dl handling through the .load() function on the Library class, which has a special case for handling Windows already.

That would also solve the unit tests failing because of expectations changed, though you may have other failures as well post-changes for Linux. A unit test for the specific case would also be appreciated.

@hkpeprah : Thanks for your reply.

I'm not opposed to this change, but I'm a bit confused on what the DLLInfo class is solving? Is it the name being prefixed by lib?

There actually are two distinct literals involved:

the J-Link library name, to use as a parameter for the ctypes.util.find_library() function: according to the documentation, this is "the library name without any prefix like lib, suffix like .so, .dylib or version number"
a file name pattern, used by the pylink.Library.find_library_{windows,linux,darwin}() functions when scanning the directories where the JLink Software pack is expected to be installed (/opt/SEGGER/JLink et cetera)

On Windows, the return value of Library.get_appropriate_windows_sdk_name(), 'JLinkARM' or 'JLink_x64, is used for both tasks, which works as expected: these values are both valid parameters for the search by library name in ctypes.util.find_library(), and appropriate for the search by file name in Library.find_library_windows().

However, on Linux and MacOS:

jlinkarm is the literal appropriate for the search by library name
libjlinkarm is the literal appropriate for the search by file name

Additionally, on Linux, ctypes.util.find_library() will not return the library full file path, but only the file name, for e.g. libjlinkarm.so.7: the class JLinkDllInfo mostly implements how to retrieve the full file path in this later case (please see bellow).

Note that my initial patch missed that JLINK_SDK_NAME was also used by the pylink.Library.find_library_{linux,darwin}() functions (the search by file on Linux/MacOS): I've updated the PR to fix this.

In my testing, I think I've found that ctypes knows how to handle the leading lib, but I may be mistaken.

AFAICT, on Linux ctypes.util.find_library('libjlinkarm') consistently fails (returns 'None), where ctypes.util.find_library('jlinkarm') will successfully resolve a J-Link library installed in LD_LIBRARY_PATH.

I can't test for MacOS, and it's a non issue on Windows.

Outside of that, it seems like we could address the Linux-specific dl handling through the .load() function on the Library class,

It's indeed done in the Library.load_default() function, though somewhat indirectly, incidentally breaking unit tests expectations.

which has a special case for handling Windows already.

The Library class even implements three specials cases, find_library_{windows,linux,darwin}(), but these are for the search by file name, i.e. when scanning the c:\\Program Files\SEGGER, /opt/SEGGER, or /Applications/SEGGER directories.

There are no distinct code paths to handle the different semantics of the ctypes.util.find_library() function's return value when searching by library name: the full file path on Window and MacOS, or only the file name (or so name) on Linux.

The JLinkDllInfo class is mostly here to retrieve the full file path in this later case (again, please see bellow).

That would also solve the unit tests failing because of expectations changed, though you may have other failures as well post-changes for Linux.

If I understand correctly, some tests fail because now the function find_library() is not called from the Library class, but from JLinkDllInfo, which is not mocked, and thus does not account for the mock_find_library.assert_called_once_with() thing.

I did create a separated class thinking it was a better design:

to implement the search by library name consistently for all platforms through a single API (the JLinkDllInfo.path property)
to not pollute the main Library code with the Linux specific (and a bit ugly) dlinfo() dance required to retrieve the full library file path based on its so name

But I could try a refactoring to move the whole search by library name implementation to the Library class, and see if that solves at least some of the Expected 'find_library' to be called once. Called 0 times.. That would also imply to replace jlinkarm (JLINK_SDK_NAME) by libjlinkarm (JLINK_SDK_STARTS_WITH, see commit ff640fa) where appropriate in test_library.py. [Edit: No, this is obviously wrong, find_library() is consistently and rightly called with JLINK_SDK_NAME]

There are also tests that fail with No such file or directory: 'C:\\': can I ignore (or skip) these on Linux, or should some magic mock_directories() make them run just fine ?

A unit test for the specific case would also be appreciated.

I fully understand, and will try to come with something.

I may ask for some help, though: for e.g. I do not (yet) understand if the testing platform should actually host the various versions of the J-Link library the unit tests seem to look for, or if the whole thing is mocked. If the later is true: how do you mock, say on Windows, what the Linux or MacOS dynamic linker would have done via ctypes.util.find_library() to resolve a library name ?

As I've said, though I'm willing to help, I haven't written a line of Python for years, Python2 was still widely use, and have to read a bit more about this unittest.mock libray (luckily I'm familiar with mocks): it'd be great if you could either give me a few hints, or direct me to a good pointer, something that would teach me what I need without too much overhead.

[Edit: Politeness may deserve I add two lines of background. Other software used to fail loading the J-Link library from LD_LIBRARY_PATH, for e.g. Nordic nrfjprog command line tool did, but they've generally been fixed by now, and pyOCD, which relies on pylink, is the last tool I daily need that still fails: that's why I may appear a bit sensitive about this issue, please forgive me if I sounded rude by any mean. (*) Once more today pyocd was failing, until I understand I was using the upstream pylink library pulled with the Zephyr SDK ;-)
]

[Edit2: the JLinkDllInfo class was consistently misspelled JLinkarmDllInfo This class has since been renamed to JLinkarmDlInfo, see 45c8398.]

Thanks.

-- chris

The latest patch seems to help: AFAICT this PR would pass all unit tests (python3 setup.py test), as if I even had a C:\\ directory on my Linux box now, and a dozen versions of the J-Link library.

The functional tests (python3 setup.py bddtest) still fail short though: if I understand correctly I need some additional setup to run these.

Regarding the "unit test for the specific case", I can at least describe it.

The initial state to mock would be:

the OS is Linux
the J-Link Software pack is not installed somewhere into /opt/SEGGER, let's say it's installed into $JLINK_INSTALL_DIR
the directory /opt/SEGGER may not even exist, I don't think it's relevant
the J-Link library version does not seem relevant to me neither
the value of the environment variable LD_LIBRARY_PATH includes $JLINK_INSTALL_DIR, that is the relevant part

Then, we would expect that invoking the Library constructor (and in turn Library.load_default()) will successfully resolve (find and load) the JLink library, and as usual that ctypes.util.find_library() has been called once from the Library class.

We could also expect that ctypes.util.find_library() has been called once from the JLinkDllInfo class (to resolve the dl library), which would prove that it's actually how we find the library in $LD_LIBRARY_PATH, but that may sound a bit pedantic.

I'll try too implement something. [Edit: follow-up.

I'm rather stuck:

Assuming I'm able to properly mock.patch.dict() the os.environ environment to include an ad-hoc value for LD_LIBRARY_PATH, find_library() (see ctypes.util._findLib_ld()) will still fail if the library file does not actually exist (rightly relying on the native linker ld, this will ignore the mocked file system)
Without this file, we'll also fail to test the dlinfo() dance's implementation since that needs to first load the library (using the so name returned by find_library()) to then get its absolute file path (the so file)

I'm afraid a test case relying on mock_directories() to provide the J-Link library file would eventually prove itself useless, since it won't address:

does the ctypes.util.find_library() API we rely on properly honor LD_LIBRARY_PATH ? I know it does today, but how do we test for regressions (see for e.g. ctypes find_library should search LD_LIBRARY_PATH on Linux #54207) ?
do we call ctypes.util.find_library() with a valid parameter (the libjlinkarm vs jlinkarm thing), and get the expected so name ?
the dlinfo() dance's implementation itself (JLinkarmDlInfo)

Thus, to summarize the status as of now:

Without an actual library file, I can't think of any test case that would actually test the case this patch intends to address: relying on the search by library name logic (i.e. the system's dynamic linker) to properly load the J-Link library from LD_LIBRARY_PATH; do I miss something obvious ?
More generally, is there actually a case in test_library.py that covers the search by library name logic ? IMHO, they all rely on the search by file name logic: do I again miss something obvious ?
Please, could someone confirm the commit 7302f0a solves at least some of the unit tests that were failing with Expected 'find_library' to be called once. Called 0 times. ?

]

Thanks.

-- chris

Is there anything else I can do that would help fixing this ?

This thread, and the initial issue (#131), are quite old, and worth a brief summary:

Demonstrable problem: on Linux, pylink (and downstream like pyOCD or Zephyr-SDK) will fail to resolve the JLink DLL when it's not installed somewhere under /opt/SEGGER, though appropriately included in LD_LIBRARY_PATH
The root cause: when first trying to resolve the JLink DLL by library name, an invalid parameter is provided to ctypes.util.find_library() ('libjlinkarm' where it should be 'jlinkarm', API reference) which will always fail (return None)
AFAICT, the issue may also apply to Darwin when the JLink library is not installed under /Applications/SEGGER, regardless of DYLD_LIBRARY_PATH

Fixing this is not as immediate as setting JLINK_SDK_NAME to jlinkarm:

On Linux ctypes.util.find_library('jlinkarm') will not return the library's absolute file path, but its so name, for e.g. libjlinkarm.so.6: retrieving the actual absolute path requires a little dlinfo() dance, which is implemented in the new class JLinkarmDlInfo
The JLINK_SDK_NAME's value is now a valid parameter for ctypes.util.find_library(), but will break the search by file name implementation in pylink.library.Library.find_library_{linux,darwin}(), which expects a startswith('libjlinkarm') pattern: the new constant Library.JLINK_SDK_STARTS_WITH is introduced to hold this semantic

The original PR had a couple of issues:

it missed we still also need the 'libjlinkarm' literal (the JLINK_SDK_STARTS_WITH semantic)
the implementation's code path did break a bunch of unit test expectations by moving the call to ctypes.util.find_library() from the Library class to the new JLinkarmDlInfo class

These issues were addressed by the commits bellow:

ff640fa: Library: do not use JLINK_SDK_NAME in find_library_{linux,darwin}()
7302f0a: Library: refactor code path to preserve unit tests

Regarding a new unit test that would cover the case addressed by this PR:

The use case: the ability for a (Linux) user to rely on LD_LIBRARY_PATH to install the JLink library without the (root) permissions required to write under the /opt directory
What would we want to test/verify: the ctypes.util.find_library() implementation itself (ctypes find_library should search LD_LIBRARY_PATH on Linux), and above all our dlinfo() dance integration
What would we need: to provide a test environment where ctypes and the involved underlying native tool (dyld, ldconfig, ld, objdump, gcc, etc, depending on the operating system and what's available) would behave naturally

And I can't figure out how to implement such a test case without introducing not trivial requirements (simply mocking a file system is not enough here):

running the test on a (may be emulated, containerized, whatever) Linux system
actually providing a valid libjlinkarm.so file, that we'll be able to include in LD_LIBRARY_PATH, and that the system's ldconfig, ld, objdump, etc will be happy with (for e.g., on Linux, ctypes.util.find_library() will first try something like ld -t -L $LD_LIBRARY_PATH -o /dev/null -ljlinkarm)

@hkpeprah, are you still missing something, beside time, to:

re-run the unit tests (IIUC they ran only once, on 05/26), and hopefully confirm they all pass (they all do here)
try a first review of the current implementation, and let me known what would make this PR more acceptable for you (from coding style to plain errors, I willingly acknowledge I'm not a Python guru)
comment on my doubts about the requirements for a sensible test case: also notice that IMHO all existing unit tests cover variants of the search by file name, and none the search by library name code path (the call to ctypes.util.find_library())

Thanks.

-- chris

Hello @dottspina, I will take a look today. From a high level, I think we need unit tests that validate the new behaviour you're adding. You don't need to simulate the OS, but using the mock library to simulate expected return values / failure scenarios would be a baseline IMO. So this would be:

Failing to find dl (should never happen, but we should test it)
Finding dl and failing to load the DLL
Finding dl and succeeding to load the DLL

In the failure cases, we should expect the default behaviour to take over.

To add the aforementioned tests, the current Linux find library tests are passing because this condition happens to be false since we mock ctypes.cdll.LoadLibrary:

            if dlinfo(tmp_cdll_jlink._handle, JLinkarmDlInfo.RTLD_DI_LINKMAP, ctypes.byref(linkmap)) == 0:

So the result is probably some mock object that leads us to the else case. We should be explicitly testing the true and false cases, so the Linux tests that currently use the find_library() need to be updated to explicitly return a non-zero entry for this, and the new test would return zero and handle testing DLINFO.

@hkpeprah, thank you for taking the time to work on this and your constructive comments.

To add the aforementioned tests, the current Linux find library tests are passing because this condition happens to be false since we mock ctypes.cdll.LoadLibrary:
            if dlinfo(tmp_cdll_jlink._handle, JLinkarmDlInfo.RTLD_DI_LINKMAP, ctypes.byref(linkmap)) == 0:
So the result is probably some mock object that leads us to the else case. We should be explicitly testing the true and false cases, so the Linux tests that currently use the find_library() need to be updated to explicitly return a non-zero entry for this, and the new test would return zero and handle testing DLINFO.

I think the Linux tests (in tests/unit/test_library.py) currently don't even reach this condition, precisely because find_library() is mocked to return None.

While I fully agree testing this code path (the JLinkarmDlInfo class) will require the find_library() mock to at least return something, I'm not sure to understand why we would update the existing Linux tests: to me they have always been designed to cover the search by file name code path, i.e. the Library.find_library_linux() function, which happens when ctypes.util.find_library() has failed. IMHO their semantic (use cases, coverage, expected behaviors) is orthogonal to this PR, and we should not touch their implementation (in the context of this PR).

OTOH, I've started adding a couple of unit tests, mostly to cover error conditions or confirm a few code path, as you've previously suggested. I hope I'll soon be able to push something we can build upon.

Thanks.

-- chris

* Failing to find `dl` (should never happen, but we should test it)

Agreed, if a system presents itself as POSIX (Linux), it should provide the dl library, but nonetheless we have to know how the code would behave. I've added a unit test checking this behavior.

Note that while dlopen() and friends are POSIX, dlinfo() is a GNU extension, defined only on systems using the GNU libc. I've updated the PR to not assume all Linux hosts use glibc, and added a unit test to confirm the code behavior.

* Finding `dl` and failing to load the DLL

I assume here DLL refers to the dl library (not the "JLink DLL").

I can add a test where LoadLibrary()'s mock raises an OSError when called to load dl: according to our previous discussion, the test will have to confirm the exception actually propagates from the ctypes API to the call site.

* Finding `dl` and succeeding to load the DLL

I again assume DLL refers to the dl library.

IIUC, LoadLibrary()'s mock would answer a mock dl library, tmp_cdll_dl in the code bellow:

       dl_soname = ctypes_util.find_library('dl')
        if dl_soname is not None:
            tmp_cdll_dl = ctypes.cdll.LoadLibrary(dl_soname)
            dlinfo = tmp_cdll_dl.dlinfo
            dlinfo.argtypes = ctypes.c_void_p, ctypes.c_int, ctypes.c_void_p
            dlinfo.restype = ctypes.c_int

            linkmap = ctypes.c_void_p()
            if dlinfo(tmp_cdll_jlink._handle, JLinkarmDlInfo.RTLD_DI_LINKMAP, ctypes.byref(linkmap)) == 0:
                linkmap = ctypes.cast(linkmap, ctypes.POINTER(JLinkarmDlInfo.LinkMap))
                self._dll_path = linkmap.contents.l_name.decode(sys.getdefaultencoding())

And we could indeed write a test to confirm we actually execute the last two lines only when dlinfo() returns 0, but I confess that sounds a bit far-fetched to me: the if condition happens at the line just above, that contains an atomic (wrt Python) function call, there's so much locality that IMHO there remains no code path to test.

In the failure cases, we should expect the default behaviour to take over.

I assume "default behaviour" refers to falling back to the search by file name code path (Library.find_library_linux() on Linux) after the search by library name has failed (ctypes find_library() returns None).

Whether we actually execute this fallback depends on how the search by library name fails:

if the initial find_library('jlinkarm') call fails (returns None), we directly proceed with the fallback (as we do for all platforms)
if find_library('jlinkarm') succeeds (returns a soname), but the host system is not Linux with GNU libc, we also directly proceed with the fallback
if find_library('jlinkarm') succeeds, the host system presents itself as Linux with GNU libc, but find_library('dl') returns None, we skip the dlinfo() dance and proceed with the fallback (though I'm dubious about our chances to eventually dlopen() anything without dl)
if find_library('jlinkarm') succeeds, we successfully load dl, but the call to dlinfo() fails (does not return 0), we drop the found soname and continue to the fallback
if LoadLibrary() raises an OSError when loading the jlinkarm or dl shared object, we let the exception propagate and the fallback won't execute

According to our previous discussion, I think it's fine to not execute the fallback in this later case.

Adding to our thoughts about updating or not the pre-existing Linux unit tests in test_library.py: note that they all assume the initial call to find_library() fails, while the tests I add all assume find_library() does succeed, hence I feel they actually complement each others (cover distinct code paths).

Thanks.

-- chris

Adding to our thoughts about updating or not the pre-existing Linux unit tests in test_library.py: note that they all assume the initial call to find_library() fails, while the tests I add all assume find_library() does succeed, hence I feel they actually complement each others (cover distinct code paths).

I looked again, and you are correct.

@hkpeprah , thanks again for taking the time to review this PR.

Thanks for going through with the tests. Sorry for all the back and forth on this.

No hassle. Understanding the existing unit tests, why they pass or fail, and adding a small handful of new ones, is actually what gives me some confidence that this PR should not introduce obvious regressions. Taking care of this naturally also falls on the shoulders of the submitter.

At the end of the day, that back and forth makes for a cleaner patch.

This looks good to me to merge and un-block Linux platforms.

Nice, this should also fix pylink's downstream software (pyOCD, Zephyr-RTOS, Nordic pynrfjprog, etc) on Linux. Additionally, this may improve the situation on macOS: would you mind sharing such feedback, if any ?

Before I can commit the change, you will need to sign the Square CLA though. It is available here: https://spreadsheets.google.com/spreadsheet/viewform?formkey=dDViT2xzUHAwRkI3X3k5Z0lQM091OGc6MQ&ndplr=1

Done.

I can then merge this change and make a new release later today.

Great.

To thank you for your work on a somewhat niche issue, I'd like to let you know about a confusing use case this release will fix. On Linux, when pyOCD fails to load the JLink library, it does so silently, which is right since it does not require this library (neither does it expect a JLink hardware), but pyocd --list will then simply report "No available debug probes are connected": you, as a user, will then dig into hardware or firmware issues, missing or incorrect udev rules, etc, whereas the software has just not honored LD_LIBRARY_PATH.

Thanks.

-- chris

Changes should be available in v0.14.0. Thanks for the patch.

I was plainly wrong, we should fix this here, not downstream (see #138 and #139).

square / pylink

Library: patch possible misuse of find_library() #132