scicloj / scicloj.ml

A Clojure machine learning library
Eclipse Public License 2.0
214 stars 14 forks source link

scicloj.ml not loading with lein on windows #5

Closed swapneils closed 2 years ago

swapneils commented 2 years ago

Hello, I was trying to set up scicloj via lein on Windows, but the lein repl wouldn't work; judging from the stack trace, java wouldn't run because the 'classpath was too long'.

I looked this up, and ran into https://github.com/technomancy/leiningen/issues/2452, which mentions this is a common JVM problem and proposes a lein plugin, lein-classpath-jar, to fix it. I set up the plugin in my project.clj and successfully opened the repl.

I then encountered a different issue when trying to actually load scicloj.ml.core, that the crypto module in "com.taonesso/nippy" wasn't available, so some other file in that library (I believe encryption.clj) couldn't load. Manually importing com.taonesso/nippy led me to another error (which may or may not have been there before, and merely overshadowed by the encryption.clj problem), that "namespace 'scicloj.ml.core' not found". Commenting out ml.core, ml.metamorph, and ml.dataset allows for loading with no issues.

I am honestly not certain whether the second set of issue is due to the way lein-classpath-jar deals with long-classpaths, or an issue with how scicloj is constructed. But the first issue is definitely problematic for scicloj.ml development on Windows. Any ideas for patching this, or avoiding it on the user side?

Steps to reproduce primary issue: Create a lein project three levels below the Users folder in a Windows environment (not sure if this is necessary for reproduction, but it's how my case is positioned) Add scicloj.ml (via the lein recipe on clojars) to the project.clj defproject macro's :dependencies list Attempt to use 'cider-jack-in' or 'lein repl' in the same folder as the project's core.clj file. -> java cannot be run

Steps to reproduce secondary issue: Add lein-classpath-jar to your lein :user settings as specified by the plugin's README Attempt to use 'cider-jack-in' or 'lein repl' in the above-created project -> compilation issue regarding nippy Add nippy to project.clj :dependencies Attempt 'cider-jack-in' or 'lein repl' -> scicloj.ml.core not found

behrica commented 2 years ago

@swapneils Thanks for reporting this issue on Windows. Scicloj.ml depends indeed on nippy for now (indirectly), and it would not completely surprise me , if certain "encryption related stuff in nippy" depends on java version and OS.

I think there are 2 ways forward to investigate this.

  1. You provide me with the exact information on our java version / OS and I try o reproduce it.

and / or

  1. You try to figure out, if namespace taoensso.nippy.encryption can be loaded (= required) at all in your setup. So maybe you can do a project without scicloj.ml but with nippy and run something like: https://github.com/ptaoussanis/nippy#encryption-v2

'Requiring' `scicloj.ml' does not do any calls to anything in nippy.

joinr commented 2 years ago

@swapneils looks like jvm version is a silent culprit. If you are doing this from the REPL you would see some error messages; in my case scicloj.ml has been built on a newer jvm (I am stuck on 1.8 for work purposes):

user> (require 'mltest.core)
07:15:03.787 [nREPL-session-4f424ac5-2d82-4bf3-a176-56428e6b1568] DEBUG tech.v3.datatype.functional - JDK16 vector ops are not available: Syntax error compiling at (tech/v3/datatype/functional/vecopt.clj:1:1).
07:15:05.148 [nREPL-session-4f424ac5-2d82-4bf3-a176-56428e6b1568] DEBUG tech.v3.tensor.dimensions.global-to-local - insn custom indexing enabled!
Syntax error (UnsupportedClassVersionError) compiling at (scicloj\ml\smile\protocols.clj:1:1).
smile/data/formula/TechFactory has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
user> (require '[taoensso.nippy :as nippy])
nil
user> (require '[taoensso.nippy.crypto :as nc])
nil
user> nc/decrypt
#function[taoensso.nippy.crypto/decrypt]
user> 

nippy and nippy.crypto load fine though.

joinr commented 2 years ago

encryption ns loads too

user> (require '[taoensso.nippy.encryption :as enc])
nil
user> enc/->AES128Encryptor
#function[taoensso.nippy.encryption/eval30430/->AES128Encryptor--30443]
behrica commented 2 years ago

@swapneils looks like jvm version is a silent culprit. If you are doing this from the REPL you would see some error messages; in my case scicloj.ml has been built on a newer jvm (I am stuck on 1.8 for work purposes):


user> (require 'mltest.core)
07:15:03.787 [nREPL-session-4f424ac5-2d82-4bf3-a176-56428e6b1568] DEBUG tech.v3.datatype.functional - JDK16 vector ops are not available: Syntax error compiling at (tech/v3/datatype/functional/vecopt.clj:1:1).
07:15:05.148 [nREPL-session-4f424ac5-2d82-4bf3-a176-56428e6b1568] DEBUG tech.v3.tensor.dimensions.global-to-local - insn custom indexing enabled!
Syntax error (UnsupportedClassVersionError) compiling at (scicloj\ml\smile\protocols.clj:1:1).
smile/data/formula/TechFactory has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
This java 8 imcompability  has been fixed in `scicloj.ml.smile` with commit https://github.com/scicloj/scicloj.ml.smile/commit/c47269cf15d6916b1e5e9b08b4f8db015bac1729
i am just checking if, 'scicloj.ml 1.1' includes it already.
behrica commented 2 years ago

The java 1.8 fix was not part of scicloj.ml 1.1, but I just released 1.2, which should be java 1.8 compatible.

behrica commented 2 years ago

@swapneils Could you re-try your code with scicloj.ml 1.2 and post results here.

swapneils commented 2 years ago

Thanks for the help, it definitely seems to have done something (I'm on OpenJDK-17, but I suspect maybe tech.parallel or lein-classpath-jar were compiled with an older version of Java, since this fix qualitatively changed the error list). Unfortunately, it hasn't fully solved the issue, nor given me a clear way to work around it.

Note that aside from the main issue of the classpath being too long, I have no way of knowing whether the subsequent errors are due to scicloj and its dependencies or simply problems with lein-classpath-jar.

I'll add the other tests anyway for completeness, but the first two are the ones involving the primary issue, and the ones that don't have anything but scicloj.ml that they could be related to.

Tests in order:

Runnning a repl with nippy on its own and then requiring it works fine, and requiring nippy.encryption also works

Trying to run the repl with scicloj.ml as a dependency but without lein-classpath-jar set up in the project.clj still fails, with the same error.


Trying to run the repl with lein-classpath-jar succeeds, but now gives a different error when requiring the 3 main scicloj namespaces: Syntax error compiling at (taoensso\nippy\crypto.clj:14:79). No such var: enc/secure-rng

enc refers to com.taonesso/encore. Requiring after adding that as a dependency gave me: Syntax error (ExceptionInfo) compiling at (scicloj\ml\core.clj:45:1). Failed to find symbol 'do-ctx' in namespace 'scicloj.metamorph.core'

However, I noticed that this error (unlike the prior ones) was placed after 2 error messages and some spec.test outputs for scicloj functions: 22:51:37.528 [nREPL-session-bd619217-4e51-47c0-8311-d7db48e1f823] DEBUG tech.v3.datatype.functional-api - JDK16 vector ops are not available: Syntax error compiling . at (tech/v3/datatype/functional/vecopt.clj:44:7). 22:51:39.046 [nREPL-session-bd619217-4e51-47c0-8311-d7db48e1f823] DEBUG t.v.t.dimensions.global-to-local - insn custom indexing enabled! ..instrumented #'scicloj.ml.smile.clustering/cluster ..instrumented #'scicloj.ml.smile.clustering/cluster ..instrumented #'scicloj.ml.smile.projections/reduce-dimensions Syntax error (ExceptionInfo) compiling at (scicloj\ml\core.clj:45:1). Failed to find symbol 'do-ctx' in namespace 'scicloj.metamorph.core'

I tried removing all spec namespaces from my require list in core.clj as well as all dependencies for spec-related projects (like spec-tools), but it did not prevent the instrumentation from taking place. Running it from a brand-new file/namespace in the same project had the same result, as did direct requiring from the REPL.

Looking at the error message in the first "[nREPL..." line, I looked into dtype-next (which is supposed to be the "successor" to tech.datatype), whose README says notes that jdk-17 required special project.clj flags. I'm uncertain whether they were intended for use in user projects or simply a note as to the library's implementation, but adding them to my project.clj file removed the two "[nREPL..." lines at the top, without solving the do-ctx issue.

Then I thought maybe importing scicloj.metamorph manually would work, but that gave me some issue regarding preprocessing.clj not being available.


Final stack trace: clojure-main.core> (require '[scicloj.ml.core :as ml] '[scicloj.ml.metamorph :as mm] '[scicloj.ml.dataset :as ds]) ..instrumented #'scicloj.ml.smile.clustering/cluster ..instrumented #'scicloj.ml.smile.clustering/cluster ..instrumented #'scicloj.ml.smile.projections/reduce-dimensions Syntax error (FileNotFoundException) compiling at (scicloj\ml\metamorph.clj:1:1). Could not locate scicloj/metamorph/ml/preprocessing__init.class, scicloj/metamorph/ml/preprocessing.clj or scicloj/metamorph/ml/preprocessing.cljc on classpath.

As a side note, I'm not sure why spec is trying to instrument do-ctx despite it being a function in a projected namespace, and without either scicloj.metamorph.core (where do-ctx is implemented) or any of the .clj files between my core.clj and tech.parallel (where the code for export-symbols is implemented) requiring clojure.spec in any way.

Thoughts?

Current list of successive patches, for easier tracking of issue progress: REPL SETUP ERROR due to too-long classpath

Add lein-classpath-jar as detailed in its GitHub repository REPL works, ERROR due to encore not available

Add com.taonesso/encore to dependencies in project.clj, above the adding of scicloj.ml ERROR due to vector ops, ERROR due to spec failure on do-ctx

Add the following to :profiles in project.clj to allow vector ops in jdk-17: :jdk-17 {:jvm-opts ["--add-modules" "jdk.incubator.foreign,jdk.incubator.vector" "--enable-native-access=ALL-UNNAMED"]} ERROR due to spec failure on do-ctx

Add scicloj/metamorph to dependencies in project.clj, between scicloj.ml and com.taonesso/encore ERROR due to scicloj/metamorph/preprocessing.clj not being available

joinr commented 2 years ago

problem disappears when using clojure cli, deps.edn (for me). lack of do-ctx is result of old dependency being pulled in and colliding with new in the dependency resolution.

joinr commented 2 years ago

@behrica looks like you have no explicit dependency for scicloj.metamorph.ml:

https://github.com/scicloj/scicloj.ml/blob/main/deps.edn#L4

I think you are relying on something else to bring it in. It looks like scicloj/scicloj.ml.xgboost "5.03" is bringing in scicloj/metamorph.ml "0.3.0-beta1" instead of 0.4.1, which the older version is lacking do-ctx.

joinr commented 2 years ago

Yup, I was right.

@behrica @swapneils this works:

(defproject mltest "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "EPL-2.0 OR GPL-2.0-or-later WITH Classpath-exception-2.0"
            :url "https://www.eclipse.org/legal/epl-2.0/"}
  :dependencies [[org.clojure/clojure "1.10.3"]
                 #_[scicloj/scicloj.ml "0.1.2"]
                 [scicloj/scicloj.ml "0.1.2" :exclusions [scicloj/metamorph.ml]]
                 [scicloj/metamorph.ml "0.4.1"]])

I am able to fully load everything if I specify the scicloj.metamorph.ml dependency and prevent scicloj.ml from resolving it.

swapneils commented 2 years ago

@behrica @joinr I can confirm that using lein-classpath-jar and @joinr's manual import of metamorph.ml, and removing malli from the project.clj dependencies, this works for the test code in the README. Manual import of taoensso/encore and taoensso/nippy no longer seems to be required for functionality.

The malli incompatibility seems to be an invalid schema in scicloj.metamorph.ml. I've opened Issue 4 on that repository regarding this.

behrica commented 2 years ago

Yup, I was right.

@behrica @swapneils this works:

(defproject mltest "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "EPL-2.0 OR GPL-2.0-or-later WITH Classpath-exception-2.0"
            :url "https://www.eclipse.org/legal/epl-2.0/"}
  :dependencies [[org.clojure/clojure "1.10.3"]
                 #_[scicloj/scicloj.ml "0.1.2"]
                 [scicloj/scicloj.ml "0.1.2" :exclusions [scicloj/metamorph.ml]]
                 [scicloj/metamorph.ml "0.4.1"]])

I am able to fully load everything if I specify the scicloj.metamorph.ml dependency and prevent scicloj.ml from resolving it.

This is very strange, that it makes a difference. scicloj.ml depends on scicloj.ml.smile' which depends onscicloj/metamprph.ml 0.4.1"

behrica commented 2 years ago

@behrica looks like you have no explicit dependency for scicloj.metamorph.ml:

https://github.com/scicloj/scicloj.ml/blob/main/deps.edn#L4

I think you are relying on something else to bring it in. It looks like scicloj/scicloj.ml.xgboost "5.03" is bringing in scicloj/metamorph.ml "0.3.0-beta1" instead of 0.4.1, which the older version is lacking do-ctx.

This is wrong, and i will fix it. As scicloj.ml "exports" the symbols from metamorph.ml, it should depend on it. Maybe this solves the issue.

behrica commented 2 years ago

Issue is solved in scicloj/scicloj.ml "0.1.3"

I tested l with lein on Linux