hpc / charliecloud

Now hosted on GitLab.
https://gitlab.com/charliecloud/main
Apache License 2.0
312 stars 60 forks source link

FIPS error in lark #1917

Open mcuma opened 2 months ago

mcuma commented 2 months ago

After setting FIPS on in our HIPAA cluster we're seeing the same error as in #1367. Looking at Lark repo, https://github.com/lark-parser/lark/blob/master/lark/load_grammar.py, they have changed from md5 to sha256. Would it be possible to include newer lark into Charliecloud to enable the FIPS compliance?

I have to admit that I don't quite get how is the lark included in the Charliecloud build, from your GitHub repo it seems like it's packaged at the build time, but, I just built the latest Spack supplied Charliecloud version 0.37 and there the lark still uses md5. Based on https://github.com/lark-parser/lark/releases?q=sha256&expanded=true, the sha256 was added about a year ago.

Thanks, Martin

reidpr commented 2 months ago

Hello Martin, thank you for the excellent bug report.

Lark gets bundled by autogen.sh, right before configure, so if you build from a Git clone, it does happen during build, while if you build from a tarball, the tarball will have it already. I believe Spack builds from Git.

I am also confused because as of 0.36 we are bundling Lark 1.1.9, which is beyond 1.1.6 where the sha256 change happened. I wonder if it is somehow catching Spack’s install of Lark, which is only 1.1.2? But the bundled one should be prioritized.

@j-ogas, any ideas?

We could stop bundling Lark in the Spack package.

mcuma commented 2 months ago

Hi Reid,

I am not seeing any indication of lark coming from spack - spack find does not show it as a dependency.

Looking into /uufs/chpc.utah.edu/sys/spack/v019/linux-rocky8-nehalem/gcc-8.5.0/charliecloud-0.37-tg6pqsbipr6sksyey5yfada3kvskfom2/lib/charliecloud, there is this:

ls lark* lark: ast_utils.py grammar.py init.py load_grammar.py parse_tree_builder.py tools tree_templates.py common.py grammars lark.py parser_frontends.py py.typed tree_matcher.py utils.py exceptions.py indenter.py lexer.py parsers reconstruct.py tree.py visitors.py

lark-1.1.9.dist-info: entry_points.txt INSTALLER LICENSE METADATA RECORD top_level.txt WHEEL

So, we're good there. Now, for a second I thought I stupidly looked at the older 0.35 build which has the older lark (which I used to start this PR as I "fixed" the 0.37 last night so it's not giving this error anymore).

Then, in the 0.37, I had to grep for "md5" to remember what I did last night, I modified a DIFFERENT file, which is lib/charliecloud/build_cache.py

This file is referencing the md5 twice: build_cache.py:511: h = hashlib.md5(usedforsecurity=False) buildcache.py:623: h = hashlib.md5(psid.id,usedforsecurity=False)

Notice that I added the "usedforsecurity=False" which makes Charliecloud to work with FIPS, I am guessing it's OK since it looks like you're using this to just generate a hash for a cache item, but, regardless, this seems like Charliecloud piece of code so it'd be great to modify that to use the sha256 or something that conforms with the FIPS.

Thanks, and sorry about the initial confusion. Martin

reidpr commented 2 months ago

I am not seeing any indication of lark coming from spack - spack find does not show it as a dependency.

That’s what I saw as well. I was speculating that something else had pulled in Spack’s Lark and we were importing that instead of the bundled version.

Now, for a second I thought I stupidly looked at the older 0.35 build which has the older lark (which I used to start this PR as I "fixed" the 0.37 last night so it's not giving this error anymore).

To clarify, you’re seeing the warning with both 0.35 and (unmodified) 0.37, both installed with Spack?

Notice that I added the "usedforsecurity=False" which makes Charliecloud to work with FIPS, I am guessing it's OK since it looks like you're using this to just generate a hash for a cache item, but, regardless, this seems like Charliecloud piece of code so it'd be great to modify that to use the sha256 or something that conforms with the FIPS.

Interestingly, use of a crypto-inappropriate hash is deliberate, because we’ve never thought through the attack surface of the build cache and I didn’t want to give the impression that it was more reliable than it is in that regard. This is documented somewhere I’m pretty sure, but I forget where (and obviously not in the code lol).

We do support Python 3.6 still, and looks like usedforsecurity appeared in 3.9 (and possibly earlier versions on Red Hat IIRC).

We’d welcome a PR that uses the argument if available. Maybe a simple wrapper function for hashlib.md5()?

If you want lots more gory details on the build cache: https://dl.acm.org/doi/abs/10.1145/3624062.3624585

Thanks, and sorry about the initial confusion.

No apology needed at all! Appreciate the feedback.