samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
281 stars 242 forks source link

Keep apache ant functionality #660

Closed mmokrejs closed 8 years ago

mmokrejs commented 8 years ago

Hi, we added recently [https://github.com/gentoo-science/sci/tree/e894fffa75d809914ef009a06bfe3ce4052a91c3/dev-java/htsjdk](a package for cramtools to Gentoo Linux) and I am unhappy you decided to move away from apache ant.

Basically, relying on a binary/jar/java included inside the cramtools-*.tar.gz source tarball is not nice but moreover, this may even prevent users on other architectures to run gradle. Even worse, gradle does not run inside the sandbox (maybe one can work around it by fiddling with $HOME).

$ ebuild htsjdk-2.5.1.ebuild compile
 * htsjdk-2.5.1.tar.gz SHA256 SHA512 WHIRLPOOL size ;-) ...                                                                                                                                                                                                                                                             [ ok ]
 * Using: oracle-jdk-bin-1.8
>>> Unpacking source...
>>> Unpacking htsjdk-2.5.1.tar.gz to /scratch/var/tmp/portage/dev-java/htsjdk-2.5.1/work
>>> Source unpacked in /scratch/var/tmp/portage/dev-java/htsjdk-2.5.1/work
>>> Preparing source in /scratch/var/tmp/portage/dev-java/htsjdk-2.5.1/work/htsjdk-2.5.1 ...
>>> Source prepared.
>>> Configuring source in /scratch/var/tmp/portage/dev-java/htsjdk-2.5.1/work/htsjdk-2.5.1 ...
Rewriting attributes
Rewriting ./build.xml
>>> Source configured.
>>> Compiling source in /scratch/var/tmp/portage/dev-java/htsjdk-2.5.1/work/htsjdk-2.5.1 ...
 * ACCESS DENIED:  mkdir:        /home/mmokrejs/.gradle
 * ACCESS DENIED:  mkdir:        /home/mmokrejs/.gradle
Exception in thread "main" java.io.FileNotFoundException: /home/mmokrejs/.gradle/wrapper/dists/gradle-2.13-bin/4xsgxlfjcxvrea7akf941nvc7/gradle-2.13-bin.zip.lck (No such file or directory)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at org.gradle.wrapper.ExclusiveFileAccessManager.access(ExclusiveFileAccessManager.java:49)
        at org.gradle.wrapper.Install.createDist(Install.java:48)
        at org.gradle.wrapper.WrapperExecutor.execute(WrapperExecutor.java:128)
        at org.gradle.wrapper.GradleWrapperMain.main(GradleWrapperMain.java:61)
 * ERROR: dev-java/htsjdk-2.5.1::science failed (compile phase):
 *   (no error message)
 * 
 * Call stack:
 *     ebuild.sh, line 133:  Called src_compile
 *   environment, line 4066:  Called die
 * The specific snippet of code:
 *       ./gradlew build || die
 * 
 * If you need support, post the output of `emerge --info '=dev-java/htsjdk-2.5.1::science'`,
 * the complete build log and the output of `emerge -pqv '=dev-java/htsjdk-2.5.1::science'`.
!!! When you file a bug report, please include the following information:
GENTOO_VM=oracle-jdk-bin-1.8  CLASSPATH="" JAVA_HOME="/opt/oracle-jdk-bin-1.8.0.92"
JAVACFLAGS="-source 1.8 -target 1.8" COMPILER=""
and of course, the output of emerge --info =htsjdk-2.5.1
 * The complete build log is located at '/scratch/var/tmp/portage/dev-java/htsjdk-2.5.1/temp/build.log'.
 * The ebuild environment file is located at '/scratch/var/tmp/portage/dev-java/htsjdk-2.5.1/temp/environment'.
 * Working directory: '/scratch/var/tmp/portage/dev-java/htsjdk-2.5.1/work/htsjdk-2.5.1'
 * S: '/scratch/var/tmp/portage/dev-java/htsjdk-2.5.1/work/htsjdk-2.5.1'
 * --------------------------- ACCESS VIOLATION SUMMARY ---------------------------
 * LOG FILE: "/var/log/sandbox/sandbox-12273.log"
 * 
VERSION 1.0
FORMAT: F - Function called
FORMAT: S - Access Status
FORMAT: P - Path as passed to function
FORMAT: A - Absolute Path (not canonical)
FORMAT: R - Canonical Path
FORMAT: C - Command Line

F: mkdir
S: deny
P: /home/mmokrejs/.gradle
A: /home/mmokrejs/.gradle
R: /home/mmokrejs/.gradle
C: /opt/oracle-jdk-bin-1.8.0.92/bin/java -Dorg.gradle.appname=gradlew -classpath /scratch/var/tmp/portage/dev-java/htsjdk-2.5.1/work/htsjdk-2.5.1/gradle/wrapper/gradle-wrapper.jar org.gradle.wrapper.GradleWrapperMain build 

F: mkdir
S: deny
P: /home/mmokrejs/.gradle
A: /home/mmokrejs/.gradle
R: /home/mmokrejs/.gradle
C: /opt/oracle-jdk-bin-1.8.0.92/bin/java -Dorg.gradle.appname=gradlew -classpath /scratch/var/tmp/portage/dev-java/htsjdk-2.5.1/work/htsjdk-2.5.1/gradle/wrapper/gradle-wrapper.jar org.gradle.wrapper.GradleWrapperMain build 
 * --------------------------------------------------------------------------------

Could anything be done in this regard? Thanks!

droazen commented 8 years ago

You don't have to use the included wrapper script (gradlew) to run gradle -- you're free to install it manually and run gradle directly to build htsjdk. Recommend that you try this and see if it resolves your issue.

Note that gradlew is not a binary -- it is a script that downloads the right version of gradle for your system.

You can read about the reasons we switched from ant to gradle in this thread: https://github.com/samtools/htsjdk/issues/377.

lbergelson commented 8 years ago

If gradle writing to the ~/.gradle directory is problematic for you, you can use any of the options described here to point it to a directory you're allowed to write too.

mmokrejs commented 8 years ago

@lbergelson : Thank you for the tip, so I called GRADLE_USER_HOME="${WORKDIR}" ./gradlew build which got expanded as expected but I ended up with plenty of jar files fetched from the network:

${WORKDIR} $ du -sh *
16M    caches
38M    htsjdk-2.5.1
68K     native
97M    wrapper
$

From the perspective of a Gentoo package development, it is not acceptable that a compile step fetches any file (e.g. likewise if a Makefile called wget under compile target). The files needed prior to compilation should be clearly enumerated somewhere, fetched only once, stored in a local cache, mirror, etc. Then, repeated attempts to "compile" a package will not results in unnecessary network traffic. I understand my case is a bit unusual because would I have installed gradle system-wide, this probably would not happen.

Further, can you tell me why the gradle thing fetched maven stuff as well? Contents of the three subdirectories under the fake $HOME are listed below: caches.txt native.txt wrapper.txt

Anyway, here is a full log of the automated build process. It died because it looked for .git directory somewhere. Probably looked for $HOME/.git/ but my $HOME was unset in the sandbox. ;-) build.txt

Sure one can work around all of these, but before even thinking of that let me mention that:

Gentoo Linux compiles all packages from scratch. Plenty of packages are in java and virtually all are using Ant. Moreover, Maven seems hardly usable: https://bugs.gentoo.org/show_bug.cgi?id=63285 https://bugs.gentoo.org/show_bug.cgi?id=175034 https://bugs.gentoo.org/show_bug.cgi?id=237539 https://github.com/charite/jannovar/issues/218

Although sometimes it suffices just converting build.xml intended for maven into build.xml sent into ant, other times it does not work. Finally, there is no package for gradle in Gentoo (yet). I think it is suspicious there was no need for it so far. And as I see it, it will take a lot of effort split the thing into its components (maven, gradle, whatever other). You could have seen above why not even Maven is supported in Gentoo, and more issue slike that are in Gentoo's bugzilla.

I appreciate you tried to bring your devel tree into better shape by switching to gradle but IMHO, you should have kept ant as long as possible.

mmokrejs commented 8 years ago

Actually, there is some discussion ongoing and some package definition exists in a testing repositories: https://bugs.gentoo.org/show_bug.cgi?id=339574

lbergelson commented 8 years ago

@mmokrejs I'm sorry that the change is causing problems. I suspect it will be possible to resolve these issues though. The things that get downloaded automatically are:

1) the appropriate version of gradle to build htsjdk. This is currently 2.13, but it may change in future versions since gradle is being rapidly updated and because we assume that people are mostly building using the provided wrapper. You can see what version we're using by looking for gradleVersion in the build.gradle. You can invoke a local installation of gradle with the same build commands as the wrapper, so it's possible to avoid downloading that automatically.

2) gradle plugins, htsjdk currently uses a number of gradle plugins during the build process, these are fetched automatically and should only need to be fetched once, the plugin list is in the plugins{} section of the build.gradle

3) java dependencies for htsjdk. These are downloaded from maven central which is a java dependency repository. They're enumerated in the dependencies {} section. Only the ones labelled as compile are actually needed to build the jar, testCompile is only needed if you're going to run the tests.

All of these can be cached. (Running ./gradlew jar once will do so) If they're available in the gradle cache (relative to the GRADLE_USER_HOME you specified) then subsequent builds can be peformed using the --offline option for gradle which will prevent it from reaching out to the network. These may need updating when htsjdk versions update.

Failing to build if the .git directory isn't found is a known issue ( #636 ). We thought it was very low priority, but it's an easy fix so if it's causing problems we can definitely address that.

I have trouble imagining that this is the first project that uses remote dependency resolution. How does gentoo generally deal with java/scala/groovy dependencies? From what I know, most large java projects use maven as their build system, ( although gradle is becoming increasingly popular). Even ant projects tend to use an ivy repository to do dependency resolution.

mmokrejs commented 8 years ago

Thank you for detailed answers. In principle you are right that the downloads would happen "once", but that applies only to a single "build" machine and second, the sandboxed directory tree gets of course wiped soon after the package is merged. ;-) Basically, that is why sandbox is used in the first place.

I am not a java coder, not even an official Gentoo developer (at least yet), so my answer are not only unofficial but also, like from a naive user. So, I cannot really comment more on why Gentoo doe snot use Mauve or "ivy repository", whatever that is. I tried to include URLs to real issues opened, and I hope it is very clear that Gentoo in general, does not accept bundled third-0party stuff in any package. Be it samtools sources including htslib sources, be it TransDecoder/Trinity containing cd-hit, parafly, ffindex (even with different LICENSING scheme but clearly served to user on the network under TransDecoder LICENSE). Likewise, from java packages typically Gentoo devs unbundle all the third party libs, create specific packages for each, and then make a package for the initially intended application while making it require/depend the all the sub-packages which were just unbundled. See what has been unbundled from e.g. trinity: https://gitweb.gentoo.org/proj/sci.git/tree/sci-biology/trinityrnaseq

Package source tarballs are placed on Gentoo mirrors (if licensing permits re-distribution). users can pre-download source files and compile on a cluster of hosts hundreds of Gentoo instances, and each of the instances will just use the files from /usr/portage/distfiles/ directory. At worst case, only the first compile host would have fetched the files into the place. Recently, I installed Gentoo on an "old" RedHat system on a supercomputer using https://wiki.gentoo.org/wiki/Prefix/libc Gentoo:RAP approach, and the build process htsjdk currently has would not work: the nodes on a cluster have no http/ftp/rsync access to the world. It is not a problem if one can pre-fetch the files and place them into /usr/portage/distfiles/.

Java team is IMHO very tired of java-based apps aimed for biology/bionformatics. They are typically a bundle of many jar and tar.gz files, it is a mess. One of the packages which will never make it into Gentoo is Cytoscape. It has simply too many deps, several people worked on it over many years and the set of packages has rotten sooner (due to forced upgrades and meanwhile deprecated java versions) than it was completed. For IGV package we also unbundled the needed components, see https://gitweb.gentoo.org/proj/sci.git/tree/sci-biology/igv or https://gitweb.gentoo.org/proj/sci.git/tree/sci-biology/picard for example. Tablet is in a good shape now: https://gitweb.gentoo.org/proj/sci.git/tree/sci-biology/tablet while in the past when only the binary was available with an inbuilt install4j installer and asking question interactively during the compile/install process, that was a pain: https://gitweb.gentoo.org/proj/sci.git/tree/sci-biology/tablet-bin

Back to htsjdk and gradle issue, if you can fix the issue #636 then I could probably finish the experimental package but NOT let it go alive NOT even in the science overlay (technically that would mean $KEYWORDS being non-empty in the *.ebuild file). I am sure I would get a slap immediately from other devs if the compile/testing hosts would start downloads from under travis: https://wiki.gentoo.org/wiki/Project:Science/Overlay . Provided gradle is so huge and anyway needs maven, and not even maven has been disassembled into its components yet, I just cannot proceed. I won't even try myself to write the many packages as I read already enough on the websites above. I can only repeat from what I included above: https://bugs.gentoo.org/show_bug.cgi?id=339574 https://wiki.gentoo.org/wiki/Gentoo_Java_Packing_Policy https://github.com/charite/jannovar/issues/218

Finally, I should probably direct you to contact java team, contacts are at https://wiki.gentoo.org/wiki/Java#External_resources . They can for sure answer your questions more specifically, and they are aware of the mauve/gradle bugs reports opened on gentoo and github sites.

So, why can't you just restore the Ant's build.xml? I don't think people are that much annoyed by two build systems being supported concurrently, although that was claimed in either the issue #377 or #383.

droazen commented 8 years ago

Closing this one. I'm sorry, but we can't change our build system to help you with your Gentoo packaging issues. Pulling down project dependencies from maven central is standard behavior for modern Java-based projects -- I recommend that you ask around on the Gentoo forums for suggestions on how to handle this when building packages.

mmokrejs commented 8 years ago

There is nobody working on supporting not even maven under Gentoo, and gradle is even more complex and depends on maven anyway. I posted a number of URLs to show there are almost no other maven-based packages, not even speaking of gradle. And that these build systems are not supported for a reason: unclean build system.

Thank you for your kind answers, as the Subj states, I was asking for a restore of functionality existing until recently. Gentoo is a perfect choice for bioinformatics because it offers tuned binaries to users wish. All htsjdk apps will be blocked by this, merely not ever appearing in the tree. I couldn't do more for you. Good luck.