tweag / sparkle

Haskell on Apache Spark.
BSD 3-Clause "New" or "Revised" License
447 stars 30 forks source link

Ability to use sparkle as a library #33

Closed alanz closed 8 years ago

alanz commented 8 years ago

I understand that this is still proof of concept code, and there are lots of moving parts to line up.

But at some future date it would be good to be able to use it as a library, to run custom apps.

alpmestan commented 8 years ago

Just to clarify, when you say "use it as a library", you mean just pull sparkle as a dep in a completely separate project without having to add the said project in the apps/ directory, directly or as a submodule or in some other way, right?

alanz commented 8 years ago

I understand sparkle to be two different but related things

  1. A set of functions you can call inside a spark application to interact with the spark environment while building a spark application
  2. A build system to construct a fully-resolved jar that can be submitted as a spark job.

So as an external user I would like to be able to build my app against the first one (which already exists as a library), and then easily package it via the second.

So it comes down to how to package an app as a jar, I think. I am still getting to grips with all this stuff.

alpmestan commented 8 years ago

Right, you phrased the issue better than I did, but we're indeed talking about the same thing I think. One way to tackle this problem might be to provide a little utility program that:

@mboes Thoughts?

alanz commented 8 years ago

@alpmestan : as I finished editing my reply I realised I was just paraphrasing you.

And what you propose does sound like something that would work

mboes commented 8 years ago

Completely agree with @alanz. What I was thinking was that we could add a binary to the sparkle package that is essentially some shake based tool that builds a .jar file given the name of some arbitrary binary. e.g.

$ stack exec -- sparkle package --maven foo-app

This tool would replace the copy-exe.sh script that we currently have.

Optionally, we could also have the tool generate an initial pom.xml, as @alpmestan suggests:

$ stack exec -- sparkle init --maven

It may be interesting down the line to support packaging via other frameworks, e.g. Gradle.

@alanz would you be interested in contributing something along those lines?

alanz commented 8 years ago

@mboes I last used Java in anger several years ago, so I am completely rusty on the tooling at this stage. I could possibly take a look, but I am not sure how far I would get

mboes commented 8 years ago

Thinking about this more, this may be easier than I described. I think @alpmestan is right - the name of the game here is simply to generate a an appropriate pom.xml based on some template, and copy the copy-exe.sh into the current project. As a first step we don't need to Haskellify that script necessarily.

The pom.xml script to generate should be pretty much identical to the one currently in sparkle/pom.xml. The only thing that you could do away with probably is the ${sparkle.app} variable: you could generate a pom.xml with a hardcoded app name based on user input. The rest of the workflow would be exactly as the README says today, it's just that you'd be able to run those instructions from a separate stack project, not just the root of the sparkle project.

alanz commented 8 years ago

Ok, I will give that a try, thanks for the pointers

On Wed, Apr 13, 2016 at 6:28 PM, Mathieu Boespflug <notifications@github.com

wrote:

Thinking about this more, this may be easier than I described. I think @alpmestan https://github.com/alpmestan is right - the name of the game here is simply to generate a an appropriate pom.xml based on some template, and copy the copy-exe.sh into the current project. As a first step we don't need to Haskellify that script necessarily.

The pom.xml script to generate should be pretty much identical to the one currently in sparkle/pom.xml. The only thing that you could do away with probably is the ${sparkle.app} variable: you could generate a pom.xml with a hardcoded app name based on user input. The rest of the workflow would be exactly as the README says today, it's just that you'd be able to run those instructions from a separate stack project, not just the root of the sparkle project.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/tweag/sparkle/issues/33#issuecomment-209532217

alanz commented 8 years ago

After experimenting a bit I see it is quite simple to generate a remote app.zip containing the required custom app.

It might be simplest to let sparkle generate a jar file without the app.zip in it, or with a trivial one, and then simply repack the jar with the new app.zip. That way the two parts pretty much stay separate.

mboes commented 8 years ago

Sounds good to me.

alanz commented 8 years ago

Before trying to build and repackage a jar, I tried adding the existing java source to the local pom.xml. by using the bit between the comment lines here: https://github.com/alanz/sparkle-play/blob/master/sparkle/pom.xml#L25

This does not work, but I think there is something basic I am missing.

If you follow the steps in the README for that repo, it will generate a jar file with the app in it, just missing the glue java code.

Any pointers as to how to do that?

alanz commented 8 years ago

Another option is to extend copy-exe.sh to take an optional 3rd parameter for the $DIR variable, which can be passed in as the top level stack --nix path --local-install-root value, and then in a makefile do something like

export SPARKLE_DIR=`stack --nix query locals sparkle path`
(cd  $SPARKLE_DIR &&  \
stack --nix exec -- mvn -f sparkle -Dsparkle.app=sparkle-example-hello -Dsparkle.dir=$SPARKLE_DIR package)

The pom.xml would have to optionally pass the sparkle.dir in to the copy-exe.sh script as the new 3rd param

alpmestan commented 8 years ago

How about adding the glue java code to the data-files bit of sparkle? This way, we could access those Java files from the sparkle program and simply drop them wherever necessary at "initialization time", i.e when creating the pom.xml and what not?

To summarize what I have in mind:

But feel free to explore alternative routes, of course! I'm really just thinking out loud here about what the shortest path to a reasonable solution to the problem would be. What you've done so far is quite close, but I'm thinking that the bit between the comment lines in your pom.xml is going to be hard to debug, whereas if we handle the copy of the Java code ourselves to make external projects look like what we have in the sparkle repo right now (same directory structure, regarding the pom.xml and the java code), there will be less unknowns for you/us to figure out.

alanz commented 8 years ago

Yes, I was also thinking along the lines of bringing the glue java over in the data files, so they are always in a known position.

I just realised I gave the wrong path in my pom.xml (was one directory too low), so will update that. What I committed is a proof of concept, the path will be derived from an appropriate stack query ... call.

I will continue with that for now, having just built a workable jar doing it, then we can consider other options.

On Thu, Apr 14, 2016 at 10:56 AM, Alp Mestanogullari < notifications@github.com> wrote:

How about adding the glue java code to the data-files https://www.haskell.org/cabal/users-guide/developing-packages.html#accessing-data-files-from-package-code bit of sparkle? This way, we could access those Java files from the sparkle program and simply drop them wherever necessary at "initialization time", i.e when creating the pom.xml and what not?

To summarize what I have in mind:

  • someone writes a program that uses sparkle (the library, i.e the Spark bindings) and compiles it
  • the user then calls sparkle init, possibly with some argument pointing to the directory of the user's program (or to be invoked directly from there if that's simpler to implement, for now), which generates a pom.xml with the right content instead of our sparkle.app thing, and copies the glue Java code and copy-exe.sh at the right place. Or maybe the copy-exe.sh script should be generated as well, should it depend on anything project specific, but it looks like your version of it is general enough to just be invoked with the right arguments, so it could make it into the data-files section as well.
  • the user finally invokes mvn (or maybe the sparkle tool could offer a shortcut command for this if that's not too hard to add) to package the java code and the user's app into a .jar that contains everything

But feel free to explore alternative routes, of course! I'm really just thinking out loud here about what the shortest path to a reasonable solution to the problem would be. What you've done so far is quite close, but I'm thinking that the bit between the comment lines in your pom.xml is going to be hard to debug, whereas if we handle the copy of the Java code ourselves to make external projects look like what we have in the sparkle repo right now (same directory structure, regarding the pom.xml and the java code), there will be less unknowns for you/us to figure out.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/tweag/sparkle/issues/33#issuecomment-209835715

alanz commented 8 years ago

I have a working proof of concept here: https://github.com/alanz/sparkle-play/commit/d6a38b838b40d0d4fdeb99a15bf2cf96fa87cb40

The solution I ended with yesterday had one minor error in the path I passed through.

I do think a final solution should be able to work by cabal install sparkle and then having the tools available, so putting the java in the data files and generating a script from them is probably the best option.

I think the addition I made to the pom.xml is benign, it can perhaps live in the main one too, to simplify that side, then all we need is the directory with the sources to pass in. In this case a haskell helper program could construct a script which calls mvn with the appropriate arguments.

alpmestan commented 8 years ago

Terrific! This doesn't look far at all from what we need for the sparkle tool we've been discussing in this thread. Do you feel like (and have time for) giving it a shot?

alanz commented 8 years ago

I can give it a try, but can't guarantee when I will get to it

alanz commented 8 years ago

On a clean checkout I get the following error

alanz-laptop% stack --nix build
Warning: File listed in inline-java/inline-java.cabal file does not exist: src/Foreign/JNI.c
Warning: File listed in sparkle.cabal file does not exist: build/libs/sparkle.jar
mboes commented 8 years ago

Those are warnings, not errors, and arguably a bug/limitation of inline-c and stack. Does the rest of the build proceed just fine? Do the README instructions work for your use case?

alanz commented 8 years ago

If I build again, it continues but still stops with a problem

alanz-laptop% stack --nix build
Warning: File listed in inline-java/inline-java.cabal file does not exist: src/Foreign/JNI.c
Warning: File listed in sparkle.cabal file does not exist: build/libs/sparkle.jar
inline-java-0.1.0.0: configure
inline-java-0.1.0.0: build
Progress: 1/5
--  While building package inline-java-0.1.0.0 using:
      /home/alanz/.stack/setup-exe-cache/x86_64-linux/setup-Simple-Cabal-1.22.5.0-ghc-7.10.3 --builddir=.stack-work/dist/x86_64-linux/Cabal-1.22.5.0 build lib:inline-java --ghc-options " -ddump-hi -ddump-to-file"
    Process exited with code: ExitFailure 1
    Logs have been written to: /home/alanz/mysrc/github/tweag/sparkle/.stack-work/logs/inline-java-0.1.0.0.log

    Configuring inline-java-0.1.0.0...
    Preprocessing library inline-java-0.1.0.0...
    setup-Simple-Cabal-1.22.5.0-ghc-7.10.3: src/Foreign/JNI.c: does not exist

I picked this up trying to use this project as a dependency in my own project, where it would not build

alanz commented 8 years ago

Oops, it does build clean on an absolutely fresh checkout, rather than just deleting stuff in the original repo.

But the build is failing in my project, will continue looking there

alanz commented 8 years ago

Current status

  1. Clean checkout and build of sparkle gradle branch builds cleanly as expected.
  2. Attempt to use sparkle (gradle branch) as a dependency in another project fails.

Trying to build https://github.com/alanz/sparkle-play via

stack --nix build

Results in

alanz-laptop% stack --nix build          
Warning: File listed in .stack-work/downloaded/dcbb6acecbcd15b8115bc876a3c6bd75684e38f129b11e26f69b72d5c33095ea/inline-java/inline-java.cabal file does not exist: src/Foreign/JNI.c
Warning: File listed in .stack-work/downloaded/dcbb6acecbcd15b8115bc876a3c6bd75684e38f129b11e26f69b72d5c33095ea/sparkle.cabal file does not exist: build/libs/sparkle.jar
distributed-closure-0.2.1.0: configure
distributed-closure-0.2.1.0: build
thread-local-storage-0.1.0.3: configure
thread-local-storage-0.1.0.3: build
distributed-closure-0.2.1.0: copy/register
thread-local-storage-0.1.0.3: copy/register
inline-java-0.1.0.0: configure
inline-java-0.1.0.0: build
inline-java-0.1.0.0: copy/register
sparkle-0.1: configure
sparkle-0.1: build
Warning: File listed in .stack-work/downloaded/dcbb6acecbcd15b8115bc876a3c6bd75684e38f129b11e26f69b72d5c33095ea/sparkle.cabal file does not exist: build/libs/sparkle.jar
sparkle-0.1: copy/register
Progress: 4/5'cabal copy' failed.  Error message:

--  While building package sparkle-0.1 using:
      /home/alanz/mysrc/github/alanz/sparkle-play/.stack-work/downloaded/dcbb6acecbcd15b8115bc876a3c6bd75684e38f129b11e26f69b72d5c33095ea/.stack-work/dist/x86_64-linux/Cabal-1.22.5.0/setup/setup --builddir=.stack-work/dist/x86_64-linux/Cabal-1.22.5.0 copy
    Process exited with code: ExitFailure 1
    Logs have been written to: /home/alanz/mysrc/github/alanz/sparkle-play/.stack-work/logs/sparkle-0.1.log

    [1 of 1] Compiling Main             ( /home/alanz/mysrc/github/alanz/sparkle-play/.stack-work/downloaded/dcbb6acecbcd15b8115bc876a3c6bd75684e38f129b11e26f69b72d5c33095ea/Setup.hs, /home/alanz/mysrc/github/alanz/sparkle-play/.stack-work/downloaded/dcbb6acecbcd15b8115bc876a3c6bd75684e38f129b11e26f69b72d5c33095ea/.stack-work/dist/x86_64-linux/Cabal-1.22.5.0/setup/Main.o )
    Linking /home/alanz/mysrc/github/alanz/sparkle-play/.stack-work/downloaded/dcbb6acecbcd15b8115bc876a3c6bd75684e38f129b11e26f69b72d5c33095ea/.stack-work/dist/x86_64-linux/Cabal-1.22.5.0/setup/setup ...
    Configuring sparkle-0.1...
    Preprocessing library sparkle-0.1...
    [ 1 of 12] Compiling Control.Distributed.Spark.Context ( src/Control/Distributed/Spark/Context.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Control/Distributed/Spark/Context.o )
    [ 2 of 12] Compiling Control.Distributed.Spark.SQL.Context ( src/Control/Distributed/Spark/SQL/Context.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Control/Distributed/Spark/SQL/Context.o )
    [ 3 of 12] Compiling Control.Distributed.Spark.Closure ( src/Control/Distributed/Spark/Closure.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Control/Distributed/Spark/Closure.o )
    [ 4 of 12] Compiling Control.Distributed.Spark.RDD ( src/Control/Distributed/Spark/RDD.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Control/Distributed/Spark/RDD.o )
    [ 5 of 12] Compiling Control.Distributed.Spark.PairRDD ( src/Control/Distributed/Spark/PairRDD.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Control/Distributed/Spark/PairRDD.o )
    [ 6 of 12] Compiling Control.Distributed.Spark.SQL.Row ( src/Control/Distributed/Spark/SQL/Row.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Control/Distributed/Spark/SQL/Row.o )
    [ 7 of 12] Compiling Control.Distributed.Spark.SQL.DataFrame ( src/Control/Distributed/Spark/SQL/DataFrame.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Control/Distributed/Spark/SQL/DataFrame.o )
    [ 8 of 12] Compiling Control.Distributed.Spark.ML.Feature.CountVectorizer ( src/Control/Distributed/Spark/ML/Feature/CountVectorizer.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Control/Distributed/Spark/ML/Feature/CountVectorizer.o )
    [ 9 of 12] Compiling Control.Distributed.Spark.ML.LDA ( src/Control/Distributed/Spark/ML/LDA.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Control/Distributed/Spark/ML/LDA.o )
    [10 of 12] Compiling Control.Distributed.Spark.ML.Feature.RegexTokenizer ( src/Control/Distributed/Spark/ML/Feature/RegexTokenizer.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Control/Distributed/Spark/ML/Feature/RegexTokenizer.o )
    [11 of 12] Compiling Control.Distributed.Spark.ML.Feature.StopWordsRemover ( src/Control/Distributed/Spark/ML/Feature/StopWordsRemover.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Control/Distributed/Spark/ML/Feature/StopWordsRemover.o )
    [12 of 12] Compiling Control.Distributed.Spark ( src/Control/Distributed/Spark.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Control/Distributed/Spark.o )
    In-place registering sparkle-0.1...
    Preprocessing executable 'sparkle' for sparkle-0.1...
    [1 of 2] Compiling Paths_sparkle    ( .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/autogen/Paths_sparkle.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/sparkle/sparkle-tmp/Paths_sparkle.o )
    [2 of 2] Compiling Main             ( Sparkle.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/sparkle/sparkle-tmp/Main.o )
    Linking .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/sparkle/sparkle ...
    :compileJava
    :processResources UP-TO-DATE
    :classes
    :jar
    :assemble
    :compileTestJava UP-TO-DATE
    :processTestResources UP-TO-DATE
    :testClasses UP-TO-DATE
    :test UP-TO-DATE
    :check UP-TO-DATE
    :build

    BUILD SUCCESSFUL

    Total time: 4.903 secs

    This build could be faster, please consider using the Gradle Daemon: https://docs.gradle.org/2.12/userguide/gradle_daemon.html
    setup: build/libs/sparkle.jar: does not exist

Possible causes of this issue:
* No module named "Main". The 'main-is' source file should usually have a header indicating that it's a 'Main' module.
* The Setup.hs file is changing the installation target dir.

At this point there is no sparkle.jar anywhere in the project directory tree

However, there is a jar in buildlibscalleddcbb6acecbcd15b8115bc876a3c6bd75684e38f129b11e26f69b72d5c33095ea.jar` where the guid it is named with is the directory that stack has cloned the dependency into.

alanz commented 8 years ago

It looks like this may be the solution: http://stackoverflow.com/questions/6768295/gradle-jar-file-name-in-java-plugin

"The default project name is taken from the directory the project is stored in. Instead of changing the naming of the jar explicitly, you should set the project name correct for your build. At the moment this is not possible within the build.gradle file. instead you have to create a settings.gradle file in your root directory. this settings.gradle file should have this one liner included:

rootProject.name = 'project1'

"

mboes commented 8 years ago

Nice find! And it answers my question in PR #35. Makes sense. So with PR #35 merged does this branch work well for you?

alanz commented 8 years ago

Yes, as per my example in https://github.com/alanz/sparkle-play, which currently points to the commit in my sparkle gradle branch