yast / yast-yast2

YaST module yast2
http://en.opensuse.org/Portal:YaST
GNU General Public License v2.0
54 stars 44 forks source link

Revert LD_PRELOAD Change #1248

Closed shundhammer closed 2 years ago

shundhammer commented 2 years ago

Bugzilla

https://bugzilla.suse.com/show_bug.cgi?id=1196326

Trello

https://trello.com/c/YlPWQ3YP/

Problem

PR #1236 added a workaround for a long-standing problem on the aarch64 platform: An error Can't load libsuseconnect.so due to an error cannot allocate memory in static TLS block.

A well-known workaround in the aarch64 Linux world is to force-load the affected library with LD_PRELOAD. That is what that PR did.

But since that is inherited by child processes, it also affects them, considerably adding to their memory footprint since now it force-links (!) libsuseconnect.so to every binary started that way, and YaST starts a lot of external binaries; for example for storage probing.

Fix

Don't use LD_PRELOAD; revert that first workaround.

Alternative Fix

Use LD_PRELOAD, but sanitize it immediately after starting the YaST process (after libsuseconnect.so is already loaded), so child processes don't do that as well:

Fetch the LD_PRELOAD environment variable, split it up into its components (it may contain several paths, just like LD_CONFIG or PATH, remove the part with libsuseconnect, put it back together and write it to the process environment, so child processes inherit a sanitized version.

Why not Use the Alternative Fix?

See https://bugzilla.suse.com/show_bug.cgi?id=1194996#c11:

The workaround for this was to set LD_PRELOAD in the YaST start script if libsuseconnect.so was found, but that causes new problems: This is inherited by child processes, and now every binary started from YaST also gets this LD_PRELOAD, so they are now also force-linked to libsuseconnect.so.

YaST uses external programs extensively; among lots of other things, for storage probing. That means that in many cases, storage probing now fails because of this; mostly due to out of memory because libsuseconnect.so (and the Go runtime libs that it uses?) consumes a lot more RAM.

Not only are we always on the verge of OOM anyway (and we are at the end of the food chain after the kernel, kernel modules, kernel firmware, glibc, libQt and dozens of others), but who knows what other problems this causes. This needs to stop.

It has been suggested that we should clean up LD_PRELOAD again after the YaST start. But after an internal discussion, we decided against that: It would be adding a hack on top of a hack; a workaround for a workaround. And that might create even more problems.

The original problem appears in a low-memory scenario only on aarch64; and now the first hack spread the problem to other architectures as well.

This clearly needs a real solution, not adding hacks on top of hacks. The original problem has been known since early 2019; it even made it to Stack Overflow, where they suggest to set LD_PRELOAD as a workaround.

After almost 3 years (maybe longer), there must be a real fix for this. It really can't be an acceptable solution to spread those workarounds all over all kinds of software; other projects will have the same problem as well.

LD_PRELOAD is a debugging tool for desperate situations, a last-ditch effort; it can't be the regular way to go. But recommendations on the web seem to indicate just that for the aarch64 platform.

It doesn't help the Open Source world if we keep fighting the symptoms and disguise the real problem; this basically forces everyone who ever uses any Golang-based lib to do the same hacky approach, and most of them won't even be aware of follow-up problems for child processes.

That is why we are going to revert the LD_PRELOAD change. This needs a true fix.

What will this Break?

SLE products installation with registration on aarch64 in low-RAM setups.

What will this Fix?

SLE products installation on all other platforms in low-RAM setups.

It was never a problem with enough RAM or in non-SLE installations (because there is no libsuseconnect.so).

coveralls commented 2 years ago

Coverage Status

Coverage remained the same at 41.638% when pulling 3a75efce238296bb46b41542d7b9d0d9e6d1d867 on huha-revert-ld-preload into 75641b25c72c92a9a3ef11fd6f104264c8103f37 on master.

skazi0 commented 2 years ago

FTR: the Alternative Fix was already implemented in libsuseconnect. It will be reverted by https://github.com/SUSE/connect-ng/pull/124 once this PR is merged.

yast-bot commented 2 years ago

:heavy_check_mark: Public Jenkins job #361 successfully finished :heavy_check_mark: Created OBS submit request #960249

yast-bot commented 2 years ago

:heavy_check_mark: Internal Jenkins job #195 successfully finished :heavy_check_mark: Created IBS submit request #266990