nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.87k stars 1.4k forks source link

Object store - List bucket's content is very slow #4680

Closed Stan-RED closed 1 year ago

Stan-RED commented 1 year ago

Observed behavior

Listing items in the specific bucket of the object store is very slow, even if there is only one item.

Expected behavior

Expected to get a list of the items faster.

Server and client version

nats-server 2.10.3 natscli 0.1.1 (that is recommended here https://docs.nats.io/nats-concepts/what-is-nats/walkthrough_setup) NATS.Client for .NET 1.1.0

Host environment

Microsoft Windows NT 10.0.19045.0 CPU: AMD Ryzen 9 5950X NVMe: ADATA SX8200PNP

[68776] 2023/10/19 13:47:51.238565 [INF] ---------------- JETSTREAM ----------------
[68776] 2023/10/19 13:47:51.238565 [INF]   Max Memory:      95.94 GB
[68776] 2023/10/19 13:47:51.238565 [INF]   Max Storage:     1.00 TB
[68776] 2023/10/19 13:47:51.238565 [INF]   Store Directory: "e:\Temp\nats\jetstream"
[68776] 2023/10/19 13:47:51.238565 [INF] -------------------------------------------

Steps to reproduce

I have created a bucket as in docs: nats obj add testbucket, and added a big file to it (~18G): nats obj add testbucket somefile.mp4. When trying to get the list of the objects in the bucket: nats obj ls testbucket I have received a message "nats: error: context deadline exceeded" with some delay.

In the server log I got this message: [68776] 2023/10/19 14:48:10.303340 [WRN] Internal subscription on "$JS.API.CONSUMER.CREATE.OBJ_testbucket.32td3LZE.$O.testbucket.M.>" took too long: 6.8698428s

When upgraded timeout to 10s: /nats obj ls testbucket --timeout=10s it was ok.

I supposed that this is client-side problems and created a simple app using the latest NATS.Client for .NET 1.1.0:

using NATS.Client;
using NATS.Client.JetStream;
using NATS.Client.ObjectStore;

ConnectionFactory factory = new();
IConnection connection = factory.CreateConnection();

var jso = JetStreamOptions
    .Builder()
    .WithOptOut290ConsumerCreate(true)
    .Build()
;

IObjectStore os = connection.CreateObjectStoreContext("testbucket", ObjectStoreOptions.Builder(jso).Build());

DateTime start = DateTime.Now;
var response = os.GetList();
Console.WriteLine(DateTime.Now - start);

It was also failing with default timeout. After some tuning it started working and in a consequent 10 runs time to get the list with the single object there was varying from 3.5s to 5.0s. A bit better but anyway very slow. Adding the second 14G file push this delay to 7.5+ seconds, another 8G file added one more second to this process and so on.

At the same time getting files by name is quick. It starts to download immediately.

As I see, the name of the file is added to the end of the stream (the last block file). I don't know yet how the storage is organized internally , but if it is using streams, maybe it would make sense to separate blob stream from the stream with meta-information to remove this functional dependency on the blob size? Because small objects works much faster.

derekcollison commented 1 year ago

Do you see same slow behavior with the NATS cli?

Stan-RED commented 1 year ago

Do you see same slow behavior with the NATS cli?

Is there any other CLI different from the one mentioned in the documentation - https://docs.nats.io/nats-concepts/what-is-nats/walkthrough_setup (https://github.com/nats-io/natscli/releases) and I have used?

derekcollison commented 1 year ago

That's the one.

Stan-RED commented 1 year ago

That's the one.

Then yes. The slow behavior have been started from the CLI tool. As I mentioned in the beginning of the "Steps to reproduce" there was error: "nats: error: context deadline exceeded" when trying to list objects with nats obj ls testbucket CLI command and in log was:

[68776] 2023/10/19 14:48:10.303340 [WRN] Internal subscription on
"$JS.API.CONSUMER.CREATE.OBJ_testbucket.32td3LZE.$O.testbucket.M.>" took too long: 6.8698428s

So almost 7s for the single file in the bucket.

derekcollison commented 1 year ago

Would you be willing to create a stream snapshot of the ObjectStore and share with the Synadia team?

Stan-RED commented 1 year ago

Would you be willing to create a stream snapshot of the ObjectStore and share with the Synadia team?

How to do this? And does it make sense to upload huge files somewhere if this behavior is reproducible? I have taken for example a Fedora Rawhide distro (previously it was tested in Windows 10) with the latest NATS release:

[stan@fedora ~]$ dnf info nats-server
...
Name         : nats-server
Version      : 2.10.3
Release      : 1.fc40
Architecture : x86_64
Size         : 14 M
Source       : nats-server-2.10.3-1.fc40.src.rpm

Then I took a CentOS distro (12G) for example and put it into the object store bucket in the same way:

$ curl http://centos.uni-sofia.bg/centos/8-stream/isos/x86_64/CentOS-Stream-8-x86_64-latest-dvd1.iso --output centos1.iso
$ nats obj add testbucket
$ nats obj put testbucket /home/stan/centos1.iso
$ nats obj ls testbucket
nats: error: context deadline exceeded

This is a log from the server with more details.

[1634489] 2023/10/23 11:55:44.083596 [INF]   Version:  2.10.3
[1634489] 2023/10/23 11:55:44.083600 [INF]   Git:      [not set]
[1634489] 2023/10/23 11:55:44.083603 [INF]   Name:     NAEMCRCEOGWRPQXGXOFJAW5PL6YBGEZNWOSHI4NPBPQW54DCJX352ISG
[1634489] 2023/10/23 11:55:44.083610 [INF]   Node:     u17m6wzJ
[1634489] 2023/10/23 11:55:44.083613 [INF]   ID:       NAEMCRCEOGWRPQXGXOFJAW5PL6YBGEZNWOSHI4NPBPQW54DCJX352ISG
[1634489] 2023/10/23 11:55:44.084051 [INF] Starting JetStream
[1634489] 2023/10/23 11:55:44.084264 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[1634489] 2023/10/23 11:55:44.084268 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[1634489] 2023/10/23 11:55:44.084270 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[1634489] 2023/10/23 11:55:44.084272 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[1634489] 2023/10/23 11:55:44.084274 [INF]
[1634489] 2023/10/23 11:55:44.084276 [INF]          https://docs.nats.io/jetstream
[1634489] 2023/10/23 11:55:44.084278 [INF]
[1634489] 2023/10/23 11:55:44.084280 [INF] ---------------- JETSTREAM ----------------
[1634489] 2023/10/23 11:55:44.084283 [INF]   Max Memory:      93.92 GB
[1634489] 2023/10/23 11:55:44.084285 [INF]   Max Storage:     46.96 GB
[1634489] 2023/10/23 11:55:44.084287 [INF]   Store Directory: "/tmp/nats/jetstream"
[1634489] 2023/10/23 11:55:44.084290 [INF] -------------------------------------------
[1634489] 2023/10/23 11:55:44.084591 [INF]   Starting restore for stream '$G > OBJ_tb1'
[1634489] 2023/10/23 11:55:44.084798 [INF]   Restored 0 messages for stream '$G > OBJ_tb1' in 0s
[1634489] 2023/10/23 11:55:44.085284 [INF] Listening for client connections on 0.0.0.0:4222
[1634489] 2023/10/23 11:55:44.086110 [INF] Server is ready
[1634489] 2023/10/23 11:56:40.871630 [WRN] Internal subscription on "$JS.API.CONSUMER.CREATE.OBJ_tb1.7VM0i7b5.$O.tb1.M.>" took too long: 5.170127768s
[1634489] 2023/10/23 11:56:40.871648 [WRN] Internal subscription on "$JS.API.CONSUMER.CREATE.OBJ_tb1.7VM0i7b5.$O.tb1.M.>" took too long: 5.170148617s
[1634489] 2023/10/23 11:56:40.871652 [WRN] Internal subscription on "$JS.API.CONSUMER.CREATE.OBJ_tb1.7VM0i7b5.$O.tb1.M.>" took too long: 5.170164157s
[1634489] 2023/10/23 11:56:40.871661 [WRN] 127.0.0.1:52042 - cid:9 - Readloop processing time: 5.170173817s
derekcollison commented 1 year ago

nats stream backup

Just need to use full stream name, which you can see via nats stream ls -a

derekcollison commented 1 year ago

If we can reproduce with your snapshot on our side, that would be very helpful. Thanks.

derekcollison commented 1 year ago

You are correct, we can upload any large asset ourselves, will attempt to follow your steps and reproduce. Thanks for your patience and apologies for not reading more thoroughly.

derekcollison commented 1 year ago

I can see the behavior here locally. Will keep you posted. And again thanks for the patience.