taoensso / nippy

The fastest serialization library for Clojure
https://www.taoensso.com/nippy
Eclipse Public License 1.0
1.04k stars 60 forks source link

`thaw-from-file` can't find resources when packaged as jar #137

Closed eihli closed 3 years ago

eihli commented 3 years ago

Repro:

Create a package that uses nippy to thaw from a resource. Create an uberjar from the package. Require that uberjar from another library and try to run the code that thaws from the resource.

Exact steps I took:

Built an uberjar with clj -A:depstar -m hf.depstar.uberjar prhyme.jar Library has the following line: (nippy/thaw-from-file (io/resource "dark-corpus-2.bin")) File confirmed in jar with jar ft prhyme.jar | grep dark-corpus-2.bin jar depended on by another project with {:local/root "/home/user/code/prhyme/prhyme.jar"} Error thrown when starting a repl:

Caused by: java.lang.IllegalArgumentException: 
Not a file: jar:file:/home/user/code/prhyme/prhyme.jar!/dark-corpus-2.bin
    at clojure.java.io$fn__11362.invokeStatic(io.clj:61)

Research notes

Spent some time searching for the cause and came across this thread https://groups.google.com/g/clojure/c/8scFidro4ow

The following comments made me think it was related to accessing the resource with io/file.

The io/resource function returns a URL which all the Clojure IO functions can handle just fine-as is. When running in development the URL happens to be a file:// URL, and thus something io/file can handle. Once the resource is in a JAR that is no longer the case, and hence exceptions. Just don't require a file when any URL will do and you'll be fine.

It seems my problem is related to this line in io.clj: https://github.com/clojure/clojure/blob/f437b853adeaffc5cad9bb1e01e2355357a492c9/src/clj/clojure/java/io.clj#L60

(if (= "file" (.getProtocol u))
(as-file (escaped-utf8-urlstring->str
(.replace (.getFile u) \/ File/separatorChar)))
(throw (IllegalArgumentException. (str "Not a file: " u)))))

When running from the repl, .getProtocol returns "file", but when running from an uberjar, .getProtocol returns "jar".

Possible fix

I made this change locally and confirmed it worked but don't understand what implications there may be. I tried making the change in a fork but I wasn't able to run the tests. lein test and evaluating the test comment in the tests file both just appear to hang forever.

(defn thaw-from-file
  "Convenience util: like `thaw`, but reads from `(clojure.java.io/file <file>)`.

  To thaw from a resource on classpath (e.g in Leiningen `resources` dir):
    (thaw-from-file (clojure.java.io/resource \"my-resource-name.npy\"))

  See also `freeze-to-file`."
  ([file] (thaw-from-file file nil))
  ([file thaw-opts]
   (let [xin (io/input-stream file)
         xout (ByteArrayOutputStream.)]
     (io/copy xin xout)
     (thaw (.toByteArray xout) thaw-opts))))

(comment
  (freeze-to-file "foo.npy" "hello, world!")
  (thaw-from-file "foo.npy")
  (freeze-to-file "src/foo.npy" "hello, world!")
  (thaw-from-file (jio/resource "foo.npy")))
ptaoussanis commented 3 years ago

Just pushed [com.taoensso/nippy "3.1.0"] to Clojars which includes a thaw-from-resource util 👍