nats-io / nats-architecture-and-design

Architecture and Design Docs
Apache License 2.0
177 stars 20 forks source link

ObjectStore questions #144

Closed aricart closed 1 year ago

aricart commented 1 year ago

Please refer to the ADR: https://github.com/nats-io/nats-architecture-and-design/blob/main/adr/ADR-20.md

Overview

After implementing the objectstore in JavaScript there are a few questions that I am not clear about in terms of the behaviour:

Tasks

derekcollison commented 1 year ago

Post 2.9 will try to wrap my head around this more thoroughly..

scottf commented 1 year ago

Re: Updating Meta. If the name changes, the old meta is deleted. See https://github.com/nats-io/nats.go/pull/883/files#diff-12229ea844f42fd705168380fb7870bcd7d66f0c94b53ef0efe8b9ca50f05ec7R769

Also we limit what can change in update meta to just name and description. I'll make sure this gets in the ADR, which I'm currently tweaking. This addresses your concern about UpdateMeta being public. Maybe there is a better name or signature for the api though.

The meta is stored under $O.<bucket>.M.<name-encoded>

Links always reference by bucket + name.

scottf commented 1 year ago

Copied from https://github.com/nats-io/nats.go/pull/883 Open Questions

  1. Should we allow links to links?
  2. Should we return ErrObjectNotFound for Get/GetInfo when an object is deleted? This would match what KV does on get where the entry is deleted. This would also affect links to things that were linked then deleted.
  3. Why are we keeping the meta information around for a deleted object anyway? Is it strictly for watch? Should we possibly add a Purge Deletes, this would also match KV.
  4. when an object is Put, info.ModTime is set to the client time at the end of the work. This is a client time, not the actual server message time. When machines clocks are in sync, this is probably a few milliseconds at most. We could ask the server for the message to get the actual time, but this would require an additional call to the server. Do we care one way or the other.
scottf commented 1 year ago

Addition to 3. If we do Purge Deletes, the implementation should also purge links that are orphaned. The current subject scheme would require manual traversal through the entire meta subject ($O.<bucket>.M.*)

aricart commented 1 year ago

4. when an object is Put, info.ModTime is set to the client time at the end of the work. This is a client time, not the actual server message time. When machines clocks are in sync, this is probably a few milliseconds at most. We could ask the server for the message to get the actual time, but this would require an additional call to the server. Do we care one way or the other.

Currently, the server doesn't add the entries, the client does, so this is correct. Alternatively, if we use the direct API to retrieve the message, you get the server timestamp (Nats-Time-Stamp). The only issue here is that because meta is currently updatable, it is possible for meta to specify one date, while the chunks have a different value. In all cases the client is putting data, which hopefully is trusted by the reader.

aricart commented 1 year ago
  • Why are we keeping the meta information around for a deleted object anyway? Is it strictly for watch? Should we possibly add a Purge Deletes, this would also match KV.

yes, you need it to have the tombstone, the implementation marks the entry as deleted.

aricart commented 1 year ago

2. Should we return ErrObjectNotFound for Get/GetInfo when an object is deleted? This would match what KV does on get where the entry is deleted. This would also affect links to things that were linked then deleted.

Again this is sort of language-specific, if throw an error, user will have to handle it, and return some other value on the application after having to inspect the error to figure out if it is a not found, or some other networking error etc. If you return null, you still have to handle it, but the errors are left for real exceptions. Get is kind of specific, not sure if getting a deleted is useful in all the language clients - if you want to know if the object is deleted, you can always do an info on it.

aricart commented 1 year ago

Addition to 3. If we do Purge Deletes, the implementation should also purge links that are orphaned. The current subject scheme would require manual traversal through the entire meta subject ($O.<bucket>.M.*)

Or simply fail on the get once you cannot resolve it. Since you cannot update cross object-store links (you wouldn't know what other object stores to look at), I am not sure this is useful.

scottf commented 1 year ago

To solve the time problem we could just let the client set it and write it into the json data. I think that's what you are saying Alberto

scottf commented 1 year ago

when deleted... as far as return errors, yes language specific. I think Get should return a null or error (language specific) I think GetInfo should return not found for real not found and the info for exist or deleted.

This would also solve orphaned Get / GetInfo links as they would resolve to not found.

scottf commented 1 year ago

@aricart When an object is deleted, what part of the info should be cleared? Currently the go client does this

info.Deleted = true
info.Size, info.Chunks, info.Digest = 0, 0, _EMPTY_

What about:

aricart commented 1 year ago

JS client matches Go exactly. We can consider the other fields some other time.

scottf commented 1 year ago

Optimization for getting an object: If there is only 1 chunk, get the last message by subject instead of putting a whole subscription together.

scottf commented 1 year ago

These methods need review because they open the possibility to link to stale objects. Should the signature be changed to should the implementations deal with it.

// AddLink will add a link to another object.
AddLink(name string, obj *ObjectInfo) (*ObjectInfo, error)

// AddBucketLink will add a link to another object store.
AddBucketLink(name string, bucket ObjectStore) (*ObjectInfo, error)