jimpil / duratom

A durable atom type for Clojure
Eclipse Public License 1.0
213 stars 14 forks source link

s3 nippy example doesn't work #6

Closed AdamClements closed 1 year ago

AdamClements commented 6 years ago

Nippy has low level and high level apis, as such the data saved by freeze can't actually be read by thaw-from-in! because freeze/thaw is a high level api supporting encryption, compression and including the nippy file header, where thaw-from-in! and freeze-from-in! are low level and don't support those, so when it is trying to read the type id and instead finds the first letter of the NPY header, it complains that it doesn't recognise the file.

I also have an issue in that I get a 404 from s3 if the key doesn't exist yet and I don't know what to put in it to manually create a blank nippy file as a starting point.

jimpil commented 6 years ago

Again, thanks for taking the time to report this, and to point out the low VS high level api of nippy. Have you tried something along these lines as the :read fn?

#(with-open [dis (DataInputStream. %)
             bos (ByteArrayOutputStream. (.available dis))]
   (io/copy dis bos)
   (nippy/thaw (.toByteArray bos)))

If that (or something similar) works for you let me know, and I will update the example asap. As to your other 404 problem, I'm not sure I can help here as duratom simply delegates to amazonica for creating/deleting the bucket. I will admit that the S3 code has not actually been tested because I didn't have a way of testing it, and so it's not inconceivable that I'm doing something wrong, but I can't see what at the moment.

Kind regards

jimpil commented 6 years ago

duratom 0.3.8 has been released and it comes with a helper fn in utils.clj (s3-bucket-bytes). Compose that function with nippy/thaw and you should be golden ;). I've updated the README to reflect this. Let me know if you 're still having issues.

jimpil commented 6 years ago

OK, so it turns out that I wasn't calling aws/put-object correctly. According to the amazonica README the :content-length should go under a :metadata key: https://github.com/jimpil/duratom/blob/afec9fde337d765c5d16450913a494a1f74e5c0f/src/duratom/utils.clj#L131 That could be the source of your issue, if duratom wasn't able to initialise the bucket. Can you try with duratom 0.3.9 which I've just cut fixing this? Thanks in advance...

jimpil commented 6 years ago

I've just cut 0.4.0 which improves streaming the bytes from the s3 bucket (via ut/s3-bucket-bytes). If you're doing any sort of testing using 0.3.9, I'd appreciate it if you could swap to 0.4.0, as you could be seeing major performance improvements (e.g. in the case of nippy encode bytes when using the suggested custom-reader).

AdamClements commented 6 years ago

Hey, I tried it out, and I'm still getting it error out with a 404 not found instead of creating the file initially (using s3 and nippy with the instructions in the readme)

jimpil commented 6 years ago

Are you able to create buckets manually (e.g. using aws/create-bucket) outside of duratom? I'm not sure if you've looked in utils.clj, but the four S3 relevant functions are all wrappers around amazonica (i.e. aws/create-bucket, aws/aws/does-bucket-exist, aws/delete-object & aws/put-object). I suggest that you try these functions on your REPL without involving duratom. Once you establish that they work as expected, I'd appreciate some code snippets showcasing each of those calls, and hopefully I will be able to figure out what I 'm doing wrong. Thanks in advance...

AdamClements commented 6 years ago

The bucket already exists, it's the object creation where it's falling down (and if I manually create the object then it isn't the right format for nippy so will fall over when it tries to initialise the duratom from it). As a workaround I've been running the following code for edn, haven't yet written a version for nippy - hoping the library will deal with it for me:

(defn get-or-create-s3 [init-schema]
  (log/debug "Attempting to get or set the edn file")
  (try
    (s3/get-object :bucket-name "my-existing-bucket" :key
"my-duratom-db.edn")
    (log/debug "Succesfully found the bucket")
    (catch Exception e
      (log/error "Couldn't find the bucket, creating it myself")
      (let [tmp (File/createTempFile "tmp-edn" ".edn")]
        (spit tmp init-schema)
        (s3/put-object :bucket-name "my-existing-bucket" :key
"my-duratom-db.edn") :file tmp)))))

Just to clarify - can you on your machine go from empty bucket to initialising a duratom and reloading it using the nippy + s3 example from the readme? Is this untested, or is it just me having this problem?

On Thu, Jun 14, 2018 at 1:22 PM Dimitrios Piliouras < notifications@github.com> wrote:

Are you able to create buckets manually (e.g. using aws/create-bucket) outside of duratom? I'm not sure if you've looked in utils.clj, but the three S3 relevant functions are all wrappers around amazonica (i.e. aws/create-bucket, aws/aws/does-bucket-exist, aws/delete-object & aws/put-object). I suggest that you try these functions on your REPL without involving duratom. Once you establish that they work as expected, I'd appreciate some code snippets showcasing each of those calls, and hopefully I will be able to figure out what I 'm doing wrong. Thanks in advance...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jimpil/duratom/issues/6#issuecomment-397294233, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3140imBcSV0G37gKDYAg1vYs0GLVndks5t8mOqgaJpZM4TLi83 .

jimpil commented 6 years ago

The S3 backend is the only backend that is completely untested - I believe I mentioned that in some previous comment. Moreover, you seem to be the first person even remotely trying it out it! Now, in all fairness, the only difference between the various backends are really the different underlying helpers, so if something works on one backend but not on another, it can really only be some something relating to these backend-specific helpers.

Ok, so you say the bucket exists. Can you confirm that s3/does-bucket-exist returns true? Also, can you point me to the action that returns the 404 (I assume it's s3/put-object, but it could be s3/does-bucket-exist).

In your code snippet, I don't see any credentials anywhere, so you must be using some of that defcredential/with-credential magic. That won't work with duratom I'm afraid, as I'm using the arities that expect the credential as the first argument (e.g. https://github.com/mcohen01/amazonica/blob/d1720a3985496b22aba87cc9d44160362ff1995d/test/amazonica/test/s3.clj#L153). May I ask, what sort of :credentials key are you passing to duratom upon construction? According to the link above it should be the raw credentials map.

jimpil commented 1 year ago

Revisiting this ticket because I was finally able to create an AWS account and test this myself. The good news is that the 5 S3-related helpers work as expected - observe the following:

 (def creds {:access-key "..."         ;; <= replace with yours
             :secret-key "..."         ;; <= replace with yours
             :endpoint   "eu-west-1"}) ;; <= replace with yours
  (def dummy-value (pr-str {:a 1 :b 2}))
  ;; check a bucket I know exists
  (bucket-exists? creds "jimpil-test") ;; => true
  ;; create a brand new one (with public access!)
  (create-s3-bucket creds "jimpil-test-delete-me") ;; => {:name "jimpil-test-delete-me"}
  (store-value-to-s3 creds "jimpil-test" "dummy.edn" {} dummy-value) ;; => a big response
  (get-value-from-s3  creds "jimpil-test" "dummy.edn" {} (partial read-edn-object {})) ;; => {:a 1, :b 2}
  (delete-object-from-s3 creds "jimpil-test" "dummy.edn") ;; => nil (but succeeded)

The credentials I used were for a user belonging to a user-group with full S3 access, and as you can see, he can still access a bucket which has full BlockPublicAccess enabled. In any case, that's how these functions are supposed to be called - with the credentials (which you pass when you create the duratom) as the first argument.

Now, the nippy part of the issue, is somewhat separate, and can be addressed/tweaked using the :read fn. The point is you get raw bytes from S3, and so what they 'mean' is entirely up to you.

In any case, I'm (finally) closing this 😌