projectglow / glow

An open-source toolkit for large-scale genomic analysis
https://projectglow.io
Apache License 2.0
266 stars 111 forks source link

Vulnerable shared library might make glow-spark3 vulnerable. Can you help upgrade to patch versions? #508

Closed HelenParr closed 6 months ago

HelenParr commented 2 years ago

Hi, @karenfeng , @henrydavidge , I'd like to report a vulnerable dependency in io.projectglow:glow-spark3_2.12:1.1.2.

Issue Description

I noticed that io.projectglow:glow-spark3_2.12:1.1.2 directly depends on org.apache.spark:spark-core_2.12:3.1.2 in the pom. However, as shown in the following dependency graph, org.apache.spark:spark-core_2.12:3.1.2 sufferes from the vulnerability which the C library zstd(version:1.4.8) exposed: CVE-2021-24032.

Dependency Graph between Java and Shared Libraries

image (12)

Suggested Vulnerability Patch Versions

org.apache.spark:spark-core_2.12:3.2.0 (>=3.2.0) has upgraded this vulnerable C library zstd to the patch version 1.5.0.

Java build tools cannot report vulnerable C libraries, which may induce potential security issues to many downstream Java projects. Could you please upgrade this vulnerable dependency?

Thanks for your help~ Best regards, Helen Parr

williambrandler commented 2 years ago

thanks @HelenParr , we are in the process of releasing Glow on Spark 3.2.1

https://github.com/projectglow/glow/pull/509

The release should be out later this week, will this resolve your issue? Will update once the artifacts are in maven central and docker images have been published to dockerhub

williambrandler commented 2 years ago

Glow on Spark 3.2.1 is now available @HelenParr

https://github.com/projectglow/glow/releases/tag/v1.2.1

williambrandler commented 2 years ago

Although Glow does depend on Hail, which is on Spark 3.1.2, this will need to be updated as soon as EMR and dataproc go to Spark 3.2.1

https://github.com/hail-is/hail/issues/11707

alartin commented 2 years ago

Although Glow does depend on Hail, which is on Spark 3.1.2, this will need to be updated as soon as EMR and dataproc go to Spark 3.2.1

hail-is/hail#11707

Glow DOES depend on Hail? And why did one blog post show 10X performance better than Hail? As far as I know Hail is not good with non-human species which is a design problem, does glow have same issue?

henrydavidge commented 6 months ago

This is resolved now.

@alartin Glow formerly depended on Hail only for interoperation between the two libraries -- converting between DataFrames and Hail's representation. We've since removed this functionality. None of the other functionality ever used Hail. We designed Glow to be very flexible with respect to the input data, and it is used with non-human species.