payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
18 stars 26 forks source link

Populating LD_LIBRARY_PATH can cause library conflicts when used in conda #221

Closed aidanheerdegen closed 4 years ago

aidanheerdegen commented 4 years ago

NCI users on gadi experienced a sudden unexplained error with models that had previously been working

yatm_1bb8904.exe: symbol lookup error: yatm_1bb8904.exe: undefined symbol: netcdf_mp_nf90_open_

The issue occurred with payu in the conda/analysis3-20.01 environment but was not present when conda/analysis3-19.10 was used.

This is because another package (netcdf-fortran) was upgraded in the 20.01 environment. The dependency changed from libnetcdff.so.6 to libnetcdff.so.7, which is the same library the yam exe used.

This occurs because LD_LIBRARY_PATH is populated with the python library directory: https://github.com/payu-org/payu/blame/73ba06b47ed25b2801ebcd2eb9d00c3ebc79b72b/payu/cli.py#L87-L88

so libraries in that directory are picked up preferentially.

When using python in a conda environment, the python executable has RPATH set, so it is not necessary to add to LD_LIBRARY_PATH, and within a conda environment can lead to bugs like this.

Proposed solution is to detect if the python interpreter is in a conda environment and branch past this code if so.

aekiss commented 4 years ago

Slack discussion, for reference: https://arccss.slack.com/archives/C6PP0GU9Y/p1580816901006300

marshallward commented 4 years ago

I was going to suggest appending the path, rather than prepending it, as a possible solution, but I'm glad it's been sorted out.

It feels like there may be a conda-agnostic solution to this, such as a more aggressive validation of the dynamic libraries, so I will keep this in mind.