nats-io / nats.java

Java client for NATS
Apache License 2.0
563 stars 153 forks source link

Unable to use ObjectStore with domains via jnats #1157

Closed Behnke19 closed 2 months ago

Behnke19 commented 3 months ago

Observed behavior

I was trying to setup a scenario with 2 leaf nodes running Jetstream islands using domains where they could share files via jetstream object stores. By using the command line and passing the relevant --js-domain value it works but I can't get it to work in jnats.

When I try to put a file in the object store configured with a JS domain I get this error java.io.IOException: Error Publishing: 503 No Responders Available For Request at io.nats.client.impl.NatsJetStream.processPublishResponse(NatsJetStream.java:181) at io.nats.client.impl.NatsJetStream.publishSyncInternal(NatsJetStream.java:156) at io.nats.client.impl.NatsJetStream.publish(NatsJetStream.java:50) at io.nats.client.impl.NatsObjectStore.put(NatsObjectStore.java:137) at org.example.DebugFileTransferHub.main(DebugFileTransferHub.java:51)

Expected behavior

It should be possible to interact with object stores when using a jetstream domain. This includes uploading, downloading, deleting files.

Server and client version

I am using nats:latest from docker which shows 2.10.16 when I run nats server info. I am using jnats 2.18.1 Any CLI commands I reference were run using nats 0.1.4

Host environment

I am running docker desktop on a windows 11 laptop with Core Ultra 9 processor and 32gb ram. The java code was executed from within IntelliJ Idea Ultimate using java8.

Steps to reproduce

The attached zip file contains the project. You can extract it and run any CLI commands below from the top level directory. In the zip you will find... nats_poc.zip

To setup the nats servers I did the following

docker pull nats
docker network create natsNet
docker run -d --name hub --network natsNet -p4222:4222 -p7422:7422 -p8222:8222 -v ./nats:/etc/nats nats -c /etc/nats/debugFileTransferHub.conf
docker run -d --name spoke --network natsNet -p4223:4222 -p7423:7423 -p8223:8222 -v ./nats:/etc/nats nats -c /etc/nats/debugFileTransferSpoke.conf

When I execute the DebugFileTransferHub.java file I get the 503 error mentioned above. If I don't pass in the ObjectStoreOptions to the nc.objectStore() call then it works and since the local domain when connected to the hub server is "HUB" the file ends up in the right place. However, when I then try to run DebugFileTransferSpoke.java it fails.

For added context, I figured this had to be supported so I tried it out using the nats CLI and it worked fine. Here are the series of commands I ran to prove it is possible. Reminder that port 4222 is the hub server and 4223 is the spoke server.

CLI way of doing the file dance
add --trace to any of these for more verbose output

create the store on hub
nats --server nats://testUser:testPass@localhost:4222 --js-domain HUB object add CliStore

upload file with js domain HUB on hub
nats --server nats://testUser:testPass@localhost:4222 --js-domain HUB object put CliStore ./nats/tmp.txt

check it exists from both servers views using the js domain of HUB
nats --server nats://testUser:testPass@localhost:4222 --js-domain HUB object ls CliStore
nats --server nats://testUser:testPass@localhost:4223 --js-domain HUB object ls CliStore

this can prove the file is stored on hub and not spoke. look at mem and file in use under Jetstream
nats --server nats://admin:admin@localhost:4222 server info
nats --server nats://admin:admin@localhost:4223 server info

spoke should be able to download the file if the HUB domain is used
nats --server nats://testUser:testPass@localhost:4223 --js-domain HUB object get CliStore nats/tmp.txt -O spokeout.txt

spoke should be able to delete the file if the HUB domain is used
nats --server nats://testUser:testPass@localhost:4223 --js-domain HUB object del CliStore nats/tmp.txt -f

spoke should be able to upload the file if the HUB domain is used
nats --trace --server nats://testUser:testPass@localhost:4223 --js-domain HUB object put CliStore ./nats/tmp.txt

hub should be able to download the file that the spoke uploaded
nats --server nats://testUser:testPass@localhost:4222 --js-domain HUB object get CliStore nats/tmp.txt -O hubout.txt

hub should be able to delete the file too
nats --server nats://testUser:testPass@localhost:4222 --js-domain HUB object del CliStore nats/tmp.txt -f

I did some debugging and the one thing that seemed different between the CLI and jnats was the subject being used when interacting with the store. In jnats (NatsObjectStore.java) the pubSubMetaSubject tacks on the domain to the object store subjects for chunks and meta so they are something like this "$JS.HUB.API.$O.TestStore.C.TQLCsEF9e6QkhhSx9skwtN" but the command line uses the default $O.TestStore.C.TQLCsEF9e6QkhhSx9skwtN style subject and the js-domain param makes sure it writes that subject to the proper stream for the domain. I suspect that jnats should not be modifying the subject of the object store for domains (I haven't played with custom prefixes other than domains so maybe there is a use case?). Interestingly if you upload a file that is small enough to only be 1 chunk then when you try to fetch it using NatsObjectStore.get it will actually use the rawChunkSubject ($O.TestStore.C.* style) and the file will successfully download. This seems to support the theory that changing the subject is the problem.

scottf commented 2 months ago

The problem in your java code is that your connection to the hub does not required the jsDomain to be set. If you tell the client the domain to use, it will use it. I don't know why the CLI works differently and why it does not use the domain in the publish because it should because you supply it in the command line. So just remove the .jsDomain("HUB") from the connection.

scottf commented 2 months ago

I'm also following up with the developer of the CLI. Maybe the cli made an assumption, or it's possible they did it intentionally considering the common use case or they know something I don't know.

scottf commented 2 months ago

The CLI developer has responded and raised this as an issue in the CLI and go client.

scottf commented 2 months ago

An issue has been opened on the go client. https://github.com/nats-io/nats.go/issues/1648

Behnke19 commented 2 months ago

Thanks for responding @scottf. I am not sure I understand what you mean by remove .jsDomain("HUB") from the connection. If I remove that from the ObjectStoreOptions in the DebugFileTransferHub.java file then it does successfully publish the file to the store but I can't retrieve it from the other leaf node in DebugFileTransferSpoke.java. If I also remove the .jsDomain("Hub") from the ObjectStoreOptions in the spoke java class then I get a stream not found exception. If I leave the .jsDomain("HUB") in then I get an error saying Total size does not match meta data. It seems as though it fails to find any of the chunks. Maybe I am missing something simple?

I also wanted to point out that while the CLI doesn't tack on the domain to the $O. subject when using --js-domian, it does change it at some level. For example note the $JS.API.STREAM vs $JS.HUB.API.STREAM in the traces below.

No Domain:

nats --trace --server nats://testUser:testPass@localhost:4223 object put TestStore
./nats/tmp.txt
13:37:19 >>> $JS.API.STREAM.INFO.OBJ_TestStore

13:37:19 <<< $JS.API.STREAM.INFO.OBJ_TestStore: {"type":"io.nats.jetstream.api.v1.stream_info_response","error":{"code":404,"err_code":10059,"description":"stream not found"},"total":0,"offset":0,"limit":0}
nats: error: nats: stream not found

With Domain:

nats --trace --server nats://testUser:testPass@localhost:4223 --js-domain HUB object put TestStore ./nats/tmp.txt
13:36:17 >>> $JS.HUB.API.STREAM.INFO.OBJ_TestStore

13:36:17 <<< $JS.HUB.API.STREAM.INFO.OBJ_TestStore: {"type":"io.nats.jetstream.api.v1.stream_info_response","total":0,"offset":0,"limit":0,"config":{"name":"OBJ_TestStore","subjects":["$O.TestStore.M.\u003e","$O.TestStore.C.\u003e"],"retention":"limits","max_consumers":-1,"max_msgs":-1,"max_bytes":-1,"max_age":0,"max_msgs_per_subject":-1,"max_msg_size":-1,"discard":"new","storage":"file","num_replicas":1,"duplicate_window":120000000000,"compression":"none","allow_direct":true,"mirror_direct":false,"sealed":false,"deny_delete":false,"deny_purge":false,"allow_rollup_hdrs":true,"consumer_limits":{}},"created":"2024-06-14T18:09:41.047146423Z","state":{"messages":42,"bytes":337872,"first_seq":1,"first_ts":"2024-06-14T18:09:41.080441661Z","last_seq":42,"last_ts":"2024-06-14T18:09:41.134602887Z","num_subjects":2,"consumer_count":0},"domain":"HUB","cluster":{"name":"hub","leader":"hub"},"ts":"2024-06-14T18:36:17.674513612Z"}
13:36:17 >>> $JS.HUB.API.STREAM.MSG.GET.OBJ_TestStore
{"last_by_subj":"$O.TestStore.M.Li9uYXRzL3RtcC50eHQ="}

13:36:17 <<< $JS.HUB.API.STREAM.MSG.GET.OBJ_TestStore: {"type":"io.nats.jetstream.api.v1.stream_msg_get_response","error":{"code":404,"err_code":10037,"description":"no message found"}}
13:36:17 >>> $JS.HUB.API.STREAM.MSG.GET.OBJ_TestStore
{"last_by_subj":"$O.TestStore.M.bmF0cy90bXAudHh0"}

13:36:17 <<< $JS.HUB.API.STREAM.MSG.GET.OBJ_TestStore: {"type":"io.nats.jetstream.api.v1.stream_msg_get_response","error":{"code":404,"err_code":10037,"description":"no message found"}}
Object information for TestStore > nats/tmp.txt

               Size: 327 KiB
  Modification Time: 14 Jun 24 18:36 +0000
             Chunks: 3
             Digest: SHA-256 307fab55705586462b710675d72008cf61d4eba679c7b304f6eaaf6d6dfd

Something else I noticed. Looking at the config for OBJ_TestStore in traced CLI outputs it looks like the $O. subjects for the store never have the domain prefix on them regardless of including the --js-domain at creation time. The same is true for creating the store via java with or without .jsDomain set if I look at the resulting ObjectStoreStatus after ObjectStoreManagent.create is called.

scottf commented 2 months ago

I think I can reproduce the problem. I'll have to figure out how to fix it now.

Behnke19 commented 2 months ago

If you have any more questions for me let me know. I will also mention that if I modify the logic in the Spoke java class to try to delete the file or upload a new file instead of fetching the file I get "Error Publishing: 503 No Responders Available For Request". My guess is its related but I wanted to point out its not just fetching objects that isn't working for me.

scottf commented 2 months ago

Fix is here https://github.com/nats-io/nats.java/pull/1160

Behnke19 commented 2 months ago

@scottf This should be re-opened or I can open a new bug if you prefer. I tried again with 2.19.1 and while I can download the file on the leaf now I can't delete or put files from the leaf.

This can be easily reproduced by adding these 3 lines to testObjectStoreDomains() in ObjectStoreTests.java in the nats.java repo.

+++ b/src/test/java/io/nats/client/impl/ObjectStoreTests.java
@@ -671,6 +671,10 @@ public class ObjectStoreTests extends JetStreamTestBase {
             byte[] leafBytes = leafOut.toByteArray();

             assertArrayEquals(hubBytes, leafBytes);
+
+            leafOs.delete(objectName);
+            in = Files.newInputStream(file.toPath());
+            leafOs.put(meta, in);
         });
     }
 }

I see a 503 No Responders Available for Request error. I suspect a similar error would occur with the addLink method but I haven't used links yet. I had also left a comment on #1160 about this.

scottf commented 2 months ago

@Behnke19 Part 2 here. https://github.com/nats-io/nats.java/pull/1172