square / pylink

Python Library for device debugging/programming via J-Link
https://pylink.readthedocs.io/en/latest/
Other
351 stars 128 forks source link

Library: patch possible misuse of find_library() #132

Closed dottspina closed 2 years ago

dottspina commented 2 years ago

On Linux, the Library.load_default() function fails to properly resolve the JLink shared library if not installed in /opt/SEGGER/JLink, regardless of the value of LD_LIBRARY_PATH.

This is possibly caused by an improper use of the ctypes find_library() API, which:

Thus:

Changes introduced by this patch:

If I'm correct, this should not break anything on Windows, and improve things on both Linux and MacOS.

CLAassistant commented 2 years ago

CLA assistant check
All committers have signed the CLA.

hkpeprah commented 2 years ago

I'm not opposed to this change, but I'm a bit confused on what the DLLInfo class is solving? Is it the name being prefixed by lib? In my testing, I think I've found that ctypes knows how to handle the leading lib, but I may be mistaken. Outside of that, it seems like we could address the Linux-specific dl handling through the .load() function on the Library class, which has a special case for handling Windows already.

That would also solve the unit tests failing because of expectations changed, though you may have other failures as well post-changes for Linux. A unit test for the specific case would also be appreciated.

dottspina commented 2 years ago

@hkpeprah : Thanks for your reply.

I'm not opposed to this change, but I'm a bit confused on what the DLLInfo class is solving? Is it the name being prefixed by lib?

There actually are two distinct literals involved:

On Windows, the return value of Library.get_appropriate_windows_sdk_name(), 'JLinkARM' or 'JLink_x64, is used for both tasks, which works as expected: these values are both valid parameters for the search by library name in ctypes.util.find_library(), and appropriate for the search by file name in Library.find_library_windows().

However, on Linux and MacOS:

Additionally, on Linux, ctypes.util.find_library() will not return the library full file path, but only the file name, for e.g. libjlinkarm.so.7: the class JLinkDllInfo mostly implements how to retrieve the full file path in this later case (please see bellow).

Note that my initial patch missed that JLINK_SDK_NAME was also used by the pylink.Library.find_library_{linux,darwin}() functions (the search by file on Linux/MacOS): I've updated the PR to fix this.

In my testing, I think I've found that ctypes knows how to handle the leading lib, but I may be mistaken.

AFAICT, on Linux ctypes.util.find_library('libjlinkarm') consistently fails (returns 'None), where ctypes.util.find_library('jlinkarm') will successfully resolve a J-Link library installed in LD_LIBRARY_PATH.

I can't test for MacOS, and it's a non issue on Windows.

Outside of that, it seems like we could address the Linux-specific dl handling through the .load() function on the Library class,

It's indeed done in the Library.load_default() function, though somewhat indirectly, incidentally breaking unit tests expectations.

which has a special case for handling Windows already.

The Library class even implements three specials cases, find_library_{windows,linux,darwin}(), but these are for the search by file name, i.e. when scanning the c:\\Program Files\SEGGER, /opt/SEGGER, or /Applications/SEGGER directories.

There are no distinct code paths to handle the different semantics of the ctypes.util.find_library() function's return value when searching by library name: the full file path on Window and MacOS, or only the file name (or so name) on Linux.

The JLinkDllInfo class is mostly here to retrieve the full file path in this later case (again, please see bellow).

That would also solve the unit tests failing because of expectations changed, though you may have other failures as well post-changes for Linux.

If I understand correctly, some tests fail because now the function find_library() is not called from the Library class, but from JLinkDllInfo, which is not mocked, and thus does not account for the mock_find_library.assert_called_once_with() thing.

I did create a separated class thinking it was a better design:

But I could try a refactoring to move the whole search by library name implementation to the Library class, and see if that solves at least some of the Expected 'find_library' to be called once. Called 0 times.. That would also imply to replace jlinkarm (JLINK_SDK_NAME) by libjlinkarm (JLINK_SDK_STARTS_WITH, see commit ff640fa) where appropriate in test_library.py. [Edit: No, this is obviously wrong, find_library() is consistently and rightly called with JLINK_SDK_NAME]

There are also tests that fail with No such file or directory: 'C:\\': can I ignore (or skip) these on Linux, or should some magic mock_directories() make them run just fine ?

A unit test for the specific case would also be appreciated.

I fully understand, and will try to come with something.

I may ask for some help, though: for e.g. I do not (yet) understand if the testing platform should actually host the various versions of the J-Link library the unit tests seem to look for, or if the whole thing is mocked. If the later is true: how do you mock, say on Windows, what the Linux or MacOS dynamic linker would have done via ctypes.util.find_library() to resolve a library name ?

As I've said, though I'm willing to help, I haven't written a line of Python for years, Python2 was still widely use, and have to read a bit more about this unittest.mock libray (luckily I'm familiar with mocks): it'd be great if you could either give me a few hints, or direct me to a good pointer, something that would teach me what I need without too much overhead.

[Edit: Politeness may deserve I add two lines of background. Other software used to fail loading the J-Link library from LD_LIBRARY_PATH, for e.g. Nordic nrfjprog command line tool did, but they've generally been fixed by now, and pyOCD, which relies on pylink, is the last tool I daily need that still fails: that's why I may appear a bit sensitive about this issue, please forgive me if I sounded rude by any mean. (*) Once more today pyocd was failing, until I understand I was using the upstream pylink library pulled with the Zephyr SDK ;-)
]

[Edit2: the JLinkDllInfo class was consistently misspelled JLinkarmDllInfo This class has since been renamed to JLinkarmDlInfo, see 45c8398.]

Thanks.

-- chris

dottspina commented 2 years ago

The latest patch seems to help: AFAICT this PR would pass all unit tests (python3 setup.py test), as if I even had a C:\\ directory on my Linux box now, and a dozen versions of the J-Link library.

The functional tests (python3 setup.py bddtest) still fail short though: if I understand correctly I need some additional setup to run these.

Regarding the "unit test for the specific case", I can at least describe it.

The initial state to mock would be:

Then, we would expect that invoking the Library constructor (and in turn Library.load_default()) will successfully resolve (find and load) the JLink library, and as usual that ctypes.util.find_library() has been called once from the Library class.

We could also expect that ctypes.util.find_library() has been called once from the JLinkDllInfo class (to resolve the dl library), which would prove that it's actually how we find the library in $LD_LIBRARY_PATH, but that may sound a bit pedantic.

I'll try too implement something. [Edit: follow-up.

I'm rather stuck:

I'm afraid a test case relying on mock_directories() to provide the J-Link library file would eventually prove itself useless, since it won't address:

Thus, to summarize the status as of now:

]

Thanks.

-- chris

dottspina commented 2 years ago

Is there anything else I can do that would help fixing this ?

This thread, and the initial issue (#131), are quite old, and worth a brief summary:

Fixing this is not as immediate as setting JLINK_SDK_NAME to jlinkarm:

The original PR had a couple of issues:

These issues were addressed by the commits bellow:

Regarding a new unit test that would cover the case addressed by this PR:

And I can't figure out how to implement such a test case without introducing not trivial requirements (simply mocking a file system is not enough here):

@hkpeprah, are you still missing something, beside time, to:

Thanks.

-- chris

hkpeprah commented 2 years ago

Hello @dottspina, I will take a look today. From a high level, I think we need unit tests that validate the new behaviour you're adding. You don't need to simulate the OS, but using the mock library to simulate expected return values / failure scenarios would be a baseline IMO. So this would be:

In the failure cases, we should expect the default behaviour to take over.

hkpeprah commented 2 years ago

To add the aforementioned tests, the current Linux find library tests are passing because this condition happens to be false since we mock ctypes.cdll.LoadLibrary:

            if dlinfo(tmp_cdll_jlink._handle, JLinkarmDlInfo.RTLD_DI_LINKMAP, ctypes.byref(linkmap)) == 0:

So the result is probably some mock object that leads us to the else case. We should be explicitly testing the true and false cases, so the Linux tests that currently use the find_library() need to be updated to explicitly return a non-zero entry for this, and the new test would return zero and handle testing DLINFO.

dottspina commented 2 years ago

@hkpeprah, thank you for taking the time to work on this and your constructive comments.

To add the aforementioned tests, the current Linux find library tests are passing because this condition happens to be false since we mock ctypes.cdll.LoadLibrary:

            if dlinfo(tmp_cdll_jlink._handle, JLinkarmDlInfo.RTLD_DI_LINKMAP, ctypes.byref(linkmap)) == 0:

So the result is probably some mock object that leads us to the else case. We should be explicitly testing the true and false cases, so the Linux tests that currently use the find_library() need to be updated to explicitly return a non-zero entry for this, and the new test would return zero and handle testing DLINFO.

I think the Linux tests (in tests/unit/test_library.py) currently don't even reach this condition, precisely because find_library() is mocked to return None.

While I fully agree testing this code path (the JLinkarmDlInfo class) will require the find_library() mock to at least return something, I'm not sure to understand why we would update the existing Linux tests: to me they have always been designed to cover the search by file name code path, i.e. the Library.find_library_linux() function, which happens when ctypes.util.find_library() has failed. IMHO their semantic (use cases, coverage, expected behaviors) is orthogonal to this PR, and we should not touch their implementation (in the context of this PR).

OTOH, I've started adding a couple of unit tests, mostly to cover error conditions or confirm a few code path, as you've previously suggested. I hope I'll soon be able to push something we can build upon.

Thanks.

-- chris

dottspina commented 2 years ago
* Failing to find `dl` (should never happen, but we should test it)

Agreed, if a system presents itself as POSIX (Linux), it should provide the dl library, but nonetheless we have to know how the code would behave. I've added a unit test checking this behavior.

Note that while dlopen() and friends are POSIX, dlinfo() is a GNU extension, defined only on systems using the GNU libc. I've updated the PR to not assume all Linux hosts use glibc, and added a unit test to confirm the code behavior.

* Finding `dl` and failing to load the DLL

I assume here DLL refers to the dl library (not the "JLink DLL").

I can add a test where LoadLibrary()'s mock raises an OSError when called to load dl: according to our previous discussion, the test will have to confirm the exception actually propagates from the ctypes API to the call site.

* Finding `dl` and succeeding to load the DLL

I again assume DLL refers to the dl library.

IIUC, LoadLibrary()'s mock would answer a mock dl library, tmp_cdll_dl in the code bellow:

       dl_soname = ctypes_util.find_library('dl')
        if dl_soname is not None:
            tmp_cdll_dl = ctypes.cdll.LoadLibrary(dl_soname)
            dlinfo = tmp_cdll_dl.dlinfo
            dlinfo.argtypes = ctypes.c_void_p, ctypes.c_int, ctypes.c_void_p
            dlinfo.restype = ctypes.c_int

            linkmap = ctypes.c_void_p()
            if dlinfo(tmp_cdll_jlink._handle, JLinkarmDlInfo.RTLD_DI_LINKMAP, ctypes.byref(linkmap)) == 0:
                linkmap = ctypes.cast(linkmap, ctypes.POINTER(JLinkarmDlInfo.LinkMap))
                self._dll_path = linkmap.contents.l_name.decode(sys.getdefaultencoding())

And we could indeed write a test to confirm we actually execute the last two lines only when dlinfo() returns 0, but I confess that sounds a bit far-fetched to me: the if condition happens at the line just above, that contains an atomic (wrt Python) function call, there's so much locality that IMHO there remains no code path to test.

In the failure cases, we should expect the default behaviour to take over.

I assume "default behaviour" refers to falling back to the search by file name code path (Library.find_library_linux() on Linux) after the search by library name has failed (ctypes find_library() returns None).

Whether we actually execute this fallback depends on how the search by library name fails:

According to our previous discussion, I think it's fine to not execute the fallback in this later case.

Adding to our thoughts about updating or not the pre-existing Linux unit tests in test_library.py: note that they all assume the initial call to find_library() fails, while the tests I add all assume find_library() does succeed, hence I feel they actually complement each others (cover distinct code paths).

Thanks.

-- chris

hkpeprah commented 2 years ago

Adding to our thoughts about updating or not the pre-existing Linux unit tests in test_library.py: note that they all assume the initial call to find_library() fails, while the tests I add all assume find_library() does succeed, hence I feel they actually complement each others (cover distinct code paths).

I looked again, and you are correct.

dottspina commented 2 years ago

@hkpeprah , thanks again for taking the time to review this PR.

Thanks for going through with the tests. Sorry for all the back and forth on this.

No hassle. Understanding the existing unit tests, why they pass or fail, and adding a small handful of new ones, is actually what gives me some confidence that this PR should not introduce obvious regressions. Taking care of this naturally also falls on the shoulders of the submitter.

At the end of the day, that back and forth makes for a cleaner patch.

This looks good to me to merge and un-block Linux platforms.

Nice, this should also fix pylink's downstream software (pyOCD, Zephyr-RTOS, Nordic pynrfjprog, etc) on Linux. Additionally, this may improve the situation on macOS: would you mind sharing such feedback, if any ?

Before I can commit the change, you will need to sign the Square CLA though. It is available here: https://spreadsheets.google.com/spreadsheet/viewform?formkey=dDViT2xzUHAwRkI3X3k5Z0lQM091OGc6MQ&ndplr=1

Done.

I can then merge this change and make a new release later today.

Great.

To thank you for your work on a somewhat niche issue, I'd like to let you know about a confusing use case this release will fix. On Linux, when pyOCD fails to load the JLink library, it does so silently, which is right since it does not require this library (neither does it expect a JLink hardware), but pyocd --list will then simply report "No available debug probes are connected": you, as a user, will then dig into hardware or firmware issues, missing or incorrect udev rules, etc, whereas the software has just not honored LD_LIBRARY_PATH.

Thanks.

-- chris

hkpeprah commented 2 years ago

Changes should be available in v0.14.0. Thanks for the patch.

dottspina commented 2 years ago

I was plainly wrong, we should fix this here, not downstream (see #138 and #139).