scala / scala-lang

sources for the Scala language website
https://scala-lang.org
Other
274 stars 320 forks source link

Build process acceptable to Ubuntu, Debian & Co #295

Closed kno10 closed 9 years ago

kno10 commented 9 years ago

The current build/installation process of Scala appears to be unacceptable to major Linux distributions such as Debian and Ubuntu. These continue to ship Scala 2.9.2.

Current scala "debian packages" provided on the scala-lang web site:

  1. include unnecessary binary code (native code hidden in jline-2.12.1.jar) which only supports i386 and amd64 architectures. Instead, the package should depend on the libjline2-java package.
  2. the package includes akka-actor, which seems to be unused.
  3. documentation should be in a separate package, so you do not need to install documentation on your servers, only on your development machines.
  4. the package should be split into compiler, documentation and runtime/library packages.
  5. download is unsigned, and no metadata is provided. A proper repository would allow users to install the package using apt-get, and automatically install updates.
  6. the current build process only works online, AFAICT, making it unsuitable for secure builds without access to the internet.

Scala should provide alternate build methods acceptable to Linux distributions, so it can be included out-of-the-box on operating systems that have such high standards like reproducible builds... As is, scala seems to be largely ignorant of Linux packaging best practises, so this is a bug in Scala; not just with the Linux distributions being "too lazy" to package the current version. Instead, it appears as if nobody successfully managed to build clean, acceptable packages for years...

This is "highly embarassing" [sic] for Scala, that apparently nobody else manages to provide installation packages of this open-source software; but they all come from a single, unsigned website, which even avoids https. What if the NSA injected a backdoor into scala? There is no protection in place against malicious modifications to the scala downloads.

Cross-references:

gourlaysama commented 9 years ago

I agree with your point that the scala packages need some love, but I think you are conflating two different kind of packaging here: the packaging done by distributions and the one done by upstream. Usually:

And distribution can do the above just fine, see Fedora for example:

"just fine" is probably a bit optimistic, but if one distribution managed to do it, I am hoping it can help others do the same.

Overall I think it is more a problem of finding people that have both the "knows the intricacies of the scala build" and "knows the intricacies of debian packaging" traits and have the time to work on this. I don't think distributions are "too lazy" to do it, it is just a complicated problem.

I don't know much about debian packaging (hint: Fedora user), but I would be willing to help whoever is going down that path :-)

kno10 commented 9 years ago

Well, usually upstream provides a build script that allows easy building on various platforms. JDK probably is an exception, indeed... The usual approach is e.g. a ./configure script; generated via automake so it usually even works on ancient platforms like AIX none of the upstream authors ever touched...

For some distributions it may be well acceptable to do an online only build using external dependencies. Some distributions will allow you to bootstrap building by uploading e.g. manually built sbt packages to build future sbt packages. In fact, fedora had the very same problem with scala: http://chapeau.freevariable.com/2013/08/making-fedora-a-better-place-for-scala.html And far as I can tell, it's exactly this Catch-22 problem which prevents current sbt and scala from becoming more widely available. Building sbt requires you to get an existing sbt.jar to use for building. Ouch.

Here's the Fedora ticket for a bootstrap exception: https://fedorahosted.org/fpc/ticket/389

I guess Fedora would also be happy to not need this exception...

jsuereth commented 9 years ago
  1. @kno10 UNfrotunately, given our history using ./configure + make just isn't an option. e.g. why is that "make" binary is acceptable to assume but sbt is not? It's the same kind of dogfooding/bootstrapping issue.

Right now scala requires some tools to build. These tools are not "approved" but a lot of linux distros, or require contortions to use. e.g. we did have a PR for a make-build for sbt, which was obviously rejected (why would sbt not use itself to build?).

Now, to address each point:

1. include unnecessary binary code (native code hidden in jline-2.12.1.jar) which only supports i386 and amd64 architectures. Instead, the package should depend on the libjline2-java package.

This wouldn't be a hard change to make, I think, except I'd love to know that libjline2-java package is generally available. At the time scala packaging was made, the whole java ecosystem was still a mess in linux distros (they were figuring out how to deal with maven).

In any case, I actually had forgotten there was native code in jline. That's actually a common thing for java libraries, ensuring that a single JAR is sufficient for many platforms. IIRC, I think the only native code comes from the JANSI library, which itself only has native code for dealing with windows.

2. the package includes akka-actor, which seems to be unused.

It's meant to be included on the classpath. Actually, all of the scala-moduels (scala-parser-combinators, scala-actors-migration, scala-xml, etc.) are meant to be included as an "extended standard libary". It's not used by scala specifically, but is intended to be there. (i.e. it replaces the deprecated scala-actors).

3. documentation should be in a separate package, so you do not need to install documentation on your servers, only on your development machines.

Documentation = man pages? If so, that's not a hard fix.

4. the package should be split into compiler, documentation and runtime/library packages.

If someone wants to make those changes, they can. I think it does add a lot of complexity in our own deployment process.

5. download is unsigned, and no metadata is provided. A proper repository would allow users to install the package using apt-get, and automatically install updates.

The unsigned bit I think is a flop/mistake int he hand-off from me to @adriaanm . It should be signed. Additionally, would something like "bintray" serve a good host for apt-get? We're using that in sbt right now, and it's been working pretty well.

6. the current build process only works online, AFAICT, making it unsuitable for secure builds without access to the internet.

Yes, scala cannot be built from scratch without the internet. This is true of a lot of java projects. How is linux dealing with the maven ecosystem? You could take the same approach with Scala.

In any case, as a debian user (and the author of the initial debian packaging) I'd love to make what we have better for folks. I do not plan to fight the politics of getting included in debian itself, nor do I find that battle worth it, for my own purposes. I'd be more than happy to help someone else who would like to do it. However, I don't think ./configure + ./make is a suitable technical solution to the scala build.

IF you'd like to chat over alternatives and solutions, let me know.

kno10 commented 9 years ago

(1). The bootstrapping problem seems to be very real - that is why there are no more up-to-date packages available...

Yes, the problem exists in Linux distributions in other cases. For example, C compilers are usually written in C, too; so they need to be bootstrapped. But there is a reason why people try to keep this minimal; at the cost of having not all the latest features available in their compiler source code.

For scala, it appears to be so bad that people who attempt to make new scala packages all give up because of the build process...

(2). that is what dependencies are for. If you want libjline2-java to be present, you specify a dependency on it (instead of including a copy).

The reason why this is superior is security updates. Assume there is a security issue with jline. For example, with the native code in it (it also includes .so files for Linux, btw.). People forget about where they put copies of that, and suddenly your scala application is vulnerable to some clever ANSI injection... Linux (and other operating systems) have gone through such problems way too often. For example zlib/libpng who had security issues in 2002/2005 required over 500 applications to perform security updates, including Windows. I wouldn't be surprised if there are still quite a lot of vulnerable applications around because they contain a statically linked copy of the library somewhere. For this reason, policy was created that strongly discourages duplicate code and binaries: https://www.debian.org/doc/debian-policy/ch-source.html#s-embeddedfiles With the key takeaway: share libraries wherever possible, to only have to fix bugs once.

(3. + 4). Would be easy to solve with a proper packaging stage - doesn't require much more than specifying which files/folders go into which package. See e.g. https://github.com/satabin/scala-debian/blob/master/debian/scala-doc.install

Oh, the packages should also have proper conflicts set. In particular, unless they are split into proper scala-library etc. package as the default version on Debian and Ubuntu, they need to conflict with these packages to allow proper installation. Otherwise, the scala-lang.org package will fail to install unless the user has first removed all traces from the distribution package...

Much of this, of course, is to blame on limitations of the ugly hack that is SbtNativePackager.

(5). No, I wouldn't touch bintray with a stick. It's essentially a CDN - it doesn't solve build quality problems of your data, only of your deliverly speed.

Instead of handing your GPG key over to them (yes, they do suggest you do this), you should setup a secure signing environment, and push the resulting repository to any CDN of your liking (preferrably even one that has directory browsing enabled...)

(6). You are right - a lot of Maven projects suffer from this, too. They cannot be reliably be rebuilt, because of circular dependencies in their toolchain. A lot of Java stuff these days is outdated on Linux distributions, because no sustainable build process has yet been found. It is a general flaw in the community right now, to not care about sustainable deployments. Fire-and-forget deployments ala "just download my VM here, it has everything you need" are too popular. A lot of Maven based building also seems to lack a security infrastructure, with signed code etc. - I'm surprised there haven't been major hacks yet due to companies systematically loading artifacts from untrusted locations into their development environments... Reminds me a lot of the 1990s shareware, until people got hit by viruses and worms every week...

For example, there are efforts to have every package installed register in the maven repository in /usr/share/maven-repo - this can allow building some packages in offline mode using --offline -Dmaven.repo.local=/usr/share/maven-repo. I haven't used this yet, though.

SethTisue commented 9 years ago

this is a valuable discussion, but the scala-lang repo isn't the right place for the issue, so I'm closing it. (scala-lang is the repo for the scala-lang.org website itself)

the ticket on this in JIRA GitHub is https://issues.scala-lang.org/browse/SI-9299 scala/bug#9299. I've added a link from there to this discussion, for future reference.