sass / libsass

A C/C++ implementation of a Sass compiler
https://sass-lang.com/libsass
Other
4.33k stars 462 forks source link

random_device: rdrand failed - Crashes host application #3151

Closed benbucksch closed 3 years ago

benbucksch commented 3 years ago

When using an application that uses libsass, the application crashes (segfaults) with random_device: rdrand failed.

Reproduction

  1. Start an application that uses libsass, e.g. simply run hugo.

Actual results

terminate called after throwing an instance of 'std::runtime_error'
  what():  random_device: rdrand failed
Cancelled (Segfault)

Expected result

Hugo works

Version

This happens on the latest version of libsass1 in Ubuntu, currently libsass version 3.6.3. However, the problem also exists on current libsass master, because the relevant code has not changed.

Cause

  1. Some AMD CPUs seems to return a non-random number, but still claim success. See e.g. reports on Twitter.
  2. std:random_device throws an exception.
  3. libsass is unable to cope, throws the exception up into the caller.
  4. The calling application cannot possibly handle this error and fails.

The underlying root cause is that libsass is using cryptographically secure random numbers. Why? I don't see why CSS would need that. I would think that pseudo-random is sufficient.

Fix

Simply use pseudo-random numbers, from rand(), instead of https://github.com/sass/libsass/blob/288610acfaa797b44c1a58a18b2cbd2e03937498/src/fn_numbers.cpp#L44-L45

I've tested that, and that fixes the bug. PR will follow.

mgreter commented 3 years ago

If I remember correctly LibSass only gets a cryptographically secure random number to seed the pseudo random number generator (PRNG) and from than on only uses PRNG. This is IMO best practice and I don't see LibSass doing anything not allowed by the C++11 APIs. So to me this clearly indicates some problem with the underlying library. Are you sure it also errors if you compile LibSass and the implementor on the same Machine? The error in node-sass kinda sounds to me like a linking issue, but that is really just a blatant assumption. Anyway, will try to research a bit on my own to see if we really need to abandon this way of initializing the PRNG. Btw. if you make a patch, please make it opt-in via a compile define or even better catch the error by the C++ call and then have a compile define to either error out or continue without true PRNG initialization. Also rand function is exposed to sass code, so we don't really know what people do with it ;)

mgreter commented 3 years ago

So after a short research I'm pretty sure this is the issue we're talking about (sorry it's in german): https://www.heise.de/newsticker/meldung/Ryzen-3000-Bekannter-Fehler-bringt-viele-Linux-Distributionen-zu-Fall-4465220.html

It seems systemd fixed this issue via https://github.com/systemd/systemd/pull/12536/commits/1c53d4a070edbec8ad2d384ba0014d0eb6bae077. Problem for LibSass is that the call to rdrand is done by C++. So best case is probably still to catch this error if possible and do the same workaround as systemd.

This is not something we should have to do, but well, reality. This is something for AMD, the firmware, the kernel or the C++ libs to fix/work-around. I'll try to find if any of them already include something in this regard in a recent version.

benbucksch commented 3 years ago

to me this clearly indicates some problem with the underlying library

random_device is throwing a runtime exception. This is correct and normal, if it cannot get random numbers.

libsass doesn't handle the error, and throws it up to the caller, instead of handling the error and doing a fallback. That's the bug. The caller has no way to handle this error, causing the segfault.

Random number sources can fail for any number of reasons, and are notoriously unreliable, so errors needs to be handled in any case, independent from the AMD bug.

I don't see LibSass doing anything not allowed by the C++11 APIs

Sure, nothing forbids that. But that doesn't answer my question:

What does a SASS need cryptographically secure random numbers for? Why are pseudo-random numbers not sufficient?

if you make a patch, please make it opt-in via a compile define or even better catch the error by the C++ call and then have a compile define to either error out or continue without true PRNG initialization.

I have made a patch that is verified to fix the problem, and I took the time to describe the problem and attach the patch, and I'm trying to help you out in fixing this serious issue, but you'll have to take it from here on.

If you insist on the strong seeding, you need a try/catch around random_device in any case. In case of errors from the secure random source, you can fall back to rand(). That would fix this bug and would be a simple and staight-forward change on top of my patch. Feel free to do that.

I hope I have provided the key information of: what is going wrong, why, and a patch that is verified to fix the problem.

benbucksch commented 3 years ago

FWIW, this is a AMD Ryzen 5 5600X, stepping 0. And everything else on my system seems to function normally, as far as I can see. Only hugo fails, in libsass.

I think the issue here is very simple: You don't need strong cryptographically secure random numbers, so you can just use rand(), and that's it.

Alternatively, you need to handle the fact that random number sources can fail, for any number of reasons, and have a fallback.

mgreter commented 3 years ago

Are you running your latest BIOS for your MB? It seems that's how AMD is intending to fix the root cause. Also std::random_device isn't really cryptographically secure, it is only more likely to fetch the random number from a dedicated hardware device. IMHO it's the correct thing to do from a code perspective, although we certainly need to handle the error case. Seems I've missed that it actually can throw (examples seem to always ignore it).

They way you patched it you can also just return a static number instead of calling rand, since we never seeded srand, it will always generate the same number (first call after RNG, seeded with 1). This would mean that sass code calling random will always get the same number sequence, which is certainly not what I would expect. So I'd rather would use time or pid to seed the PRNG instead of rand(). I will comment on your PR when I come up with a solution, shouldn't be to difficult.

mgreter commented 3 years ago

I'd have a few more questions here @benbucksch, hope you're willing to help me out here.

If you are on windows, I would love to know how the mingw implementation would work on other compilers (I think we added this case explicitly to avoid an issue with older MinGW). You could simply replace #ifdef __MINGW32__ with #ifdef _WIN32 in fn_numbers.cpp to enable that implementation. I wonder if this one works or also exposes some weird behavior.

benbucksch commented 3 years ago

Is this randomness needed to generate CSS class names and similar code?

So I'd rather would use time or pid to seed the PRNG instead of rand()

Unixtime in ms makes sense as seed, yes, depending on what you use it for.

Can you reproduce the issue easily and reliably

Yes. It's 100% reproducible for me.

(would love to test two or three things)

The problem is that I am extremely limited in time. This issue has already cost me far more time than I could afford, even until now. So, unfortunately, I'm not able to test patches.

Good news is that the problem is clearly identified, and the symptions recorded here, and the solution does not depend on me or my machine anymore. All you need to know is that random_device().rd() throws a runtime_exception, so you need to handle that. This should be done in any case, even if the AMD bug is fixed, so at this point, it's simply a question of handling the exception, i.e. a classic coding problem.

Are you on windows or on linux (my open points are mainly in regard to windows)

I'm on Ubuntu 20.04 LTS (see initial description).

I appreciate that you understood that I'm very limited in terms of time.

I hope you can find a good solution for this that you're happy with!

mgreter commented 3 years ago

Sure, I can understand, just means I have no clue what would happen in your case if you'd compile with MinGW ¯\(ツ)

benbucksch commented 3 years ago

I have no clue what would happen in your case if you'd compile with MinGW

Ah, I see. I see the #ifdef __MINGW32 using Win32 APIs and wincrypt.h. Yeah, I'm definitely not set up to compile on Windows. Good question, though.

Thanks a lot for having taken this seriously, having improved my patch and merged it! Appreciated.

Enjoy your other work on libsass! :-)

benbucksch commented 3 years ago

All I can do is look up the MS docs for CryptGenRandom, which say: "If the function fails, the return value is zero", which suggests the fix would be very similar to what you do on Unix.

mgreter commented 3 years ago

I've already refactors the code for upcoming new major release in that regard. The "mingw" part will no longer be used by default on mingw and needs to be enabled explicitly if users know they have a (now very) old mingw version with that issue. So it's ok for me ATM.

Btw. can you confirm that my patch solves your problem, just to be sure ;)

benbucksch commented 3 years ago

I've applied your PR #3153 that was merged. Yes, I can confirm that it fixes the bug for me.

# apt remove libsass
# apt install hugo
$ hugo
terminate called after throwing an instance of 'std::runtime_error'
  what():  random_device: rdrand failed
Abgebrochen (Speicherabzug geschrieben)
# dpkg -i libsass1_3.6.3-1ubuntu1_amd64.deb (built with your patch from #3153 )
$ hugo
works

Thank you!

mgreter commented 3 years ago

Danke fürs testen!