techascent / tech.ml.dataset

A Clojure high performance data processing system
Eclipse Public License 1.0
669 stars 34 forks source link

arrow breaks aot compilation #131

Closed daslu closed 4 years ago

daslu commented 4 years ago

Testing the following at https://github.com/techascent/tech.ml.dataset/commit/d04a3d4cb1ea445cd427240bd5eb2b89a2352c8e:

  1. Adding :aot :all to project.clj, and trying to lein install, I get:
$ lein install
If there are a lot of uncached dependencies this might take a while ...
Compiling 5 source files to /workspace/installations/tech/tech.ml.dataset/target/classes
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Compiling tech.libs.arrow
Syntax error macroexpanding at (copying.clj:1:1).
Execution error (ClassNotFoundException) at java.net.URLClassLoader/findClass (URLClassLoader.java:382).
org.apache.arrow.vector.types.pojo.FieldType

Full report at:
/tmp/clojure-2988770696191100774.edn
Compilation failed: Subprocess failed (exit code: 1)
  1. lein install does work after commenting out everything at src/tech/libs/arrow.clj, src/tech/libs/arrow/copying.clj, src/tech/libs/arrow/in_place.clj.

  2. Alternatively, lein install does work after adding the arrow dependencies to the main :deps section at deps.edn (and not just to the :test alias):

    {:paths ["src"]
    :deps { ...
        org.apache.arrow/arrow-memory-netty {:mvn/version "1.0.0"}
        org.apache.arrow/arrow-memory-core {:mvn/version "1.0.0"}
        org.apache.arrow/arrow-vector {:mvn/version "1.0.0"}}
    :aliases {:test
           {:extra-deps
            { ...
             org.apache.arrow/arrow-memory-netty {:mvn/version "1.0.0"}
             org.apache.arrow/arrow-memory-core {:mvn/version "1.0.0"}
             org.apache.arrow/arrow-vector {:mvn/version "1.0.0"}}}}}
cnuernber commented 4 years ago

My intention was to have optional or provided dependencies for the dataset library but perhaps this isn't really that helpful.

Another option would be to only aot tech.ml.dataset. This gets the datatype library and I think really gets about everything else.

Perhaps Arrow should be a core depenendency and not an optional dependency; I am just not sure yet.

daslu commented 4 years ago

Another option would be to only aot tech.ml.dataset.

Thanks, I guess this makes sense, since the point in aot is making the tech.ml.dataset namespace load quickly.

cnuernber commented 4 years ago

Closing this for now. In the background I am working on a reboot of the datatype system and this will solve the load time issues (and support graal native from the ground up).