sebadob / hiqlite

Hiqlite - highly-available, embeddable, raft-based SQLite + cache
Apache License 2.0
24 stars 1 forks source link

Question: how can a client connect to the server? #8

Open gembin opened 2 weeks ago

gembin commented 2 weeks ago

I was just playing with it, and I can execute queries from the dashboard. Awesome work!

Question: How can a client connect to the server and run queries? Or just access the db file directly at ~/.hiqlite/data/state_machine/db/hiqlite.db or need to embed it in my own application like this sqlite-only example?

sebadob commented 2 weeks ago

Thanks!

You should absolutely never access the DB file directly. The problem with this is that it would bring all nodes out of sync. All actions other than reading data must go through the Raft for consistency across all nodes.

I would highly suggest to embed it into your application like in the examples, yes. In this scenario, you would benefit the most from running SQLite in comparison to a traditional database like Postgres. When you start the node, you will get a hiqlite::Client back. This client handles everything automatically like making sure executes are always running on the leader, switching leaders if necessary, handling network issues and re-connects and it opens a connection pool behind the scenes as well. You will also get many functions to work with the database with the least amount of overhead in comparison to existing solutions like simple async wrappers for rusqlite for instance.

The other way of connecting would be using hiqlite::Client::remote() like shown in the bench example. This client can connect to a cluster that is running standalone or embedded into another application. You should only use this for good reason though, because a remote client loses the ability to do super fast local reads, which is one of the biggest advantages of this system.

gembin commented 2 weeks ago

Thanks a lot for your quick response and detailed answer!

Embedding it into an application makes sense to me. What about these use cases?

  1. If two (or more) different applications (e.g., app1 and app2) need to access the same database, either on the same host or different hosts, what is the best way to handle this? For example, app1 reads/writes to the database, while app2 only reads from it. Is it better for app2 to use hiqlite::Client::remote(), or should app2 be part of the cluster if it's on a different host?
  2. How can I achieve one database per tenant?
  3. Is it possible to use ORM frameworks like Diesel?
sebadob commented 1 week ago
  1. Either run a standalone server, which would make sense if you want to save on resources compared to a classic Postgres setup, or embed the Hiqlite instance in the app that is most probably doing the most amount of writes. Write throughput is usually the bottleneck for Raft applications, and you could later on automatically route all write traffic to your leader instance via a reverse proxy to get rid of a few network round-trips.
  2. If you want multiple databases, just execute another hiqlite::start_node(). This will also create a new network / server for you which you could route over different physical wires, as Hiqlite can saturate a 1GBit/s connection if your machine is fast enough.
  3. No. Hiqlite is a complete system. You should never use an ORM in the first place imho, but that's another topic.
gembin commented 1 week ago

Thanks for your answers.

  1. hiqlite::start_node() will create its own network per server, which doesn't seem very scalable, especially if there are thousands of databases.
  2. Yes, it's another topic, but it's normally the first thing I would think about when using a database. I understand that ORM is not supported by Hiqlite, but I don't see a reason to avoid using ORM. It would make life much easier (the pros and cons are obvious), unless there is a performance issue that cannot be resolved when using ORM.
sebadob commented 1 week ago

hiqlite::start_node() will create its own network per server, which doesn't seem very scalable

Actually, the opposite is true. Hiqlite can saturate a 1GBit/s network connection with only database traffic. These are cheap, 10G and up are not. So you could just add multiple smaller networks without losing throughput.

(..) especially if there are thousands of databases.

If you want thousands of databases on the same host, SQLite is not the one for you imho, at least not in this configuration. Turso does something like this, or lets you do this, but you would not have nearly the same level of consistency and the reason you typically create multiple DB's with SQLite is to overcome the single writer issue. This is not such an issue with Hiqlite becuase of the internal design.

In tests it can easily reach over 30k write TPS. Let's assume you are inserting rather small datasets with just a few columns with 56 bytes per row (the TPS don't really go down even if you write 10 times this data with each query), then you would end up writing ~5.7 GB per hour into your database. If you do things like that, you absolutely should consider running a big, dedicated Postgres cluster or something like that.

If you would write time series data for instance (which you should use a proper DB for), and you can write data in batches, Hiqlite lets you easily write more than 50GB per hour.

If you want to "scale" beyond that, a small footprint and saving resources is not your primary goal, run a proper, dedicated Postgres cluster imho. You should run a DB pool with hundreds of clients while each connection uses 10MB memory on the DB. If you have a pool of 500 connections it would need 5GB just for handling the connections, but at the same time you are writing very high amounts of data.

Hiqlite makes it easy to get rid of a central, big database server all applications rely on. If you run multiple services, each of them could just embed their own Hiqlite database. A SELECT using Hiqlite is ~10 times faster than it is on a Postgres in a fast network (or any other network database) even on the same host, because you won't have the network in between. That's its strength, not scaling to thousands of databases on the same host.

But even if you would do that, thousands of networks on the same host are not bad, it would just mean ~2000 connected ports, which is a rather small number for a server. It is so efficient, that even if you would spin up 1000 x hiqlite::start_node(), it would use less ressources than a Postgres DB with 1000 client connections. The other thing you must not forget is, that you could start each of these instances using another SSD under the hood.

I don't see a reason to avoid using ORM

I think ORMs are a super bad idea.

  1. They add an unnecessary layer of abstraction that needs resources and increases latency.
  2. For each ORM in each language, you need to learn its own DSL. This means if you learn for instance Hibernate with Java, you won't automatically be able to use any other orm, like Diesel, not even one for the same language.
  3. ORMs sometimes update their DSL between bigger releases, which means you need to learn the new DSL again.
  4. ORMs try to make simple things simple, like for instance fetch a row and map it to a struct. The thing is, that is simple with an ORM (after you learned the DSL), but it is simple with just SQL as well.
  5. With an ORM, you typically over-fetch data. Meaning you have your structs which map to rows or tables, but you often don't need all values each time. Since you are using an ORM, you would go the easy way and fetch the whole row instead of just SELECTing the parts you really need, which leads to wasted ressources and bandwidth again.
  6. An even worse situation would be, that you for instance need just 2 values from 2 different tables. With an ORM again, you typically fetch both full rows instead of just fetching these 2 tiny values with a single query.
  7. Since easy things are easy in both ORM and SQL, we are left with more complex stuff. If the ORM is good, it can do this (after, again, you learned the DSL), but they always have limits, which you don't have if you would just use SQL.
  8. You only need to learn SQL once and it works everywhere, in every language, in every shell you may log in for debugging, and so on. You also can easily jump into another project and understand everything right away, which would not be possible if it would be using another ORM. SQL is actually pretty easy because there is only so much you can do.
  9. The worst thing is, we forget stuff if we don't use it. Meaning even though I wrote an insane amount of code using Java + Hibernate back then, I would now need to learn Hibernate again. Simply because I did not use it in a very long time and it has its own DSL. I even used Diesel quite a lot in the beginning of my Rust journey, but even this I would need to learn again. SQL will always be there, you always use it, so you will never forget and need to learn stuff again.

Edit:

Regarding the "thousands of networks is not scaleable" issue. If you have a Postgres Cluster with 1000 client connections, you would have the same amount of "networks" btw. The things is, it does not really make much sense to go higher than 1000 with Postgres because they start to scale less beyond ~500 usually. With Hiqlite, you could spawn 2000 databases, which means 4000 network connections, while each of them would have the full throughput without any degradation. And you would be left with ~60k ports you could use on that machine.

gembin commented 1 week ago

Wow, interesting discussion, thanks for your insights!

It seems that you're more focused on performance and throughput. One of the key advantages of SQLite is its simplicity—it's just a file, which makes maintenance and deployment straightforward and cost-effective. So, I think maintainability and ease of use are also important factors.

With one database per tenant, we can fully isolate each tenant's data, making it easier to scale as needed (e.g., if only some tenants have large data volumes). This approach also simplifies tasks like exporting, migrating, and deleting data etc. without affecting other tenants. However, if we need to manage thousands of configurations per application, it will add unnecessary complexity. For example: imagine trying to test a 3-node cluster on a laptop with thousands of databases—port conflicts could easily arise; and opening an SSL connection for each network seems like unnecessary overhead, and having so many open ports could be a security risk, troubleshooting could become more difficult if something goes wrong with the network/raft layer or even some wrong configurations. I'm aware of Turso database has a namespace feature, which allows multiple databases on the same server while reusing the network. So my question will be: why can't hiqlite reuse the same network, and what are the concerns/trade-offs?

Regarding ORM, I've never encountered significant performance issues, and it's unlikely to be the bottleneck in most cases. ORM doesn't mean we need to learn new things if we dont want try new features. For example, we have JPA standard in Java which is backward-compatible(frameworks like spring-data is even more simple), it eliminates most of boilerplate code and should suffice for most use cases. It also makes query portable across different databases (e.g., if we want to switch to another db in some cases), and it also allows selective field retrieval which avoid fetching the entire records. Raw SQL requires different dialects for different databases (with variations in syntax, data types, etc.), and also have more boilerplate code and more error-prone. With regard to sqlite, I think the performance impact caused by the ORM abstraction layer is negligible especially when working with local read/write operations. While I'm not looking to debate ORM further, it's clear that both approaches come with their own trade-offs.

gembin commented 1 week ago

One issue with dashboard.

Both succeeded

create table test (id int)
insert into test values(1)

Caused panic

select * from test;
2024-09-04T02:14:52.685322Z  INFO post_query: hiqlite::dashboard::query: dashboard query:
select * from test;
thread 'tokio-runtime-worker' panicked at hiqlite/src/query/rows.rs:128:23:
internal error: entered unreachable code: unreachable column type: INT
[1]    2963 abort      target/release/hiqlite serve

https://github.com/sebadob/hiqlite/blob/c0b43364b08201e17eb4c26eef4d91742f10e45d/hiqlite/src/query/rows.rs#L128

Query directly from db file, the record is there

sqlite> .open hiqlite.db
sqlite> select * from test;
1
sqlite> 
sebadob commented 1 week ago

It seems that you're more focused on performance and throughput.

Yes, always. After I migrated a big Java project at my old job to Rust and was able to reduce the operational cost by ~95%, I am always focussed and being as efficient as possible. I also often write code for IoT projects, which have very limited resources, so I even have to.

One of the key advantages of SQLite is its simplicity—it's just a file, which makes maintenance and deployment straightforward and cost-effective.

That's true, but unfortunately not anymore with Hiqlite or when using a Raft. I mean, it has other advantages like high availability and even more consistency, but you need more than 1 file now. Compared to other solutions you will have a way lower footprint though which keeps it very cost-effective.

So my question will be: why can't hiqlite reuse the same network, and what are the concerns/trade-offs?

You are right and I get your point. And it would be technically pretty easy to do. For instance, I did this for the cache layer. You could take a look at the cache example. It is using an enum to differentiate between different caches. This could be done with a String of course, but an enum is, again, a lot more efficient. It makes it possible to use a simple Vec<_> for the caches under the hood and select by index, instead of hashing a String to get a value out of a HashMap with each single request.

I could do the same for databases, yes. The tradeoffs would be:

In the end, everything is possible, if you are okay with the tradeoffs.
It would also be possible to implement the State Machine logic twice, so you could do both - use it with a single DB or allow a dynamic creation and decision at runtime. This is quite a bit of work to implement properly though.

However, if we need to manage thousands of configurations per application (...)

True. But would also make the whole deployment more secure by having real separation for database traffic. For such an approach, I would design the whole system in a different way. I would never run all of these from a single application or backend, but instead start a separate container for each one, when you want separation anyway. This would make resource usage and handling a lot easier. You could have limits on each instance and make sure that one database (customer) is not using all your memory, and so on.

For example: imagine trying to test a 3-node cluster on a laptop with thousands of databases—port conflicts could easily arise

Actually, I don't see it. Sure they could arise, but you have ~64.000 ports you can chose from. When you just run it inside docker, you can also fully encapsulate it from your host.

having so many open ports could be a security risk

Not at all. It would be the exact same risk as if you would have one port open, because the application using it is the same anyway. It does not increase your attack surface in this case. It is even a benefit (as well for debugging), because you can take action very specifically if something bad is happening.

sebadob commented 1 week ago

One issue with dashboard.

Oh that's bad. It seems that SQLite is not normalizing the column types. I will open an issue about it later and check for other types.

If you just specify INTEGER instead of INT, I guess it should be fine. Unfortunately, the column type under the hood cannot be extracted type safe and must be parsed from a String. I will add additional type checks there.

gembin commented 1 week ago

reduce the operational cost by ~95%

Congrats 🎉, that's a big deal! I'm doing something similar by porting some Java projects to Rust.

If done like for caches, it would be static and the decision how many databases you start will be done at compile time, not at runtime, for better speed and efficiency.

This can be useful. For example, we may want have one DB to store some data for internal components and another for customer data. It's certainly more performant than the dynamic approach.

If it would allow a dynamic creation, it would mean less throughput, higher latency, and a bit more complex code.

I guess it shouldn't have too much impact on performance. But this could be a useful feature. Is it possible to define some features like multi-static-db, multi-dynamic-db? So users can enable the features they need without affecting single database usage. But, of course, this would add complexity to the hiqlite implementation and require more effort.

I would never run all of these from a single application or backend, but instead start a separate container for each one, when you want separation anyway. This would make resource usage and handling a lot easier. You could have limits on each instance and make sure that one database (customer) is not using all your memory, and so on.

One container per customer could work, but it seems overkill for me. It consumes unnecessary system resources, increases costs, and also increase the complexity of traffic routing and also more complex to maintain. Separate storage is more ideal for my needs since all the nodes in the cluster are identical, making it easier to manage.


Two follow-up questions

sebadob commented 1 week ago

I guess it shouldn't have too much impact on performance. But this could be a useful feature. Is it possible to define some features like multi-static-db, multi-dynamic-db? So users can enable the features they need without affecting single database usage. But, of course, this would add complexity to the hiqlite implementation and require more effort.

That would be one solution, yes. The only other thing that came to my mind you must consider is, that when you run multiple DB's using the same Raft group, you would not be able to have independent backups. This is super critical. All DB's, even though split into multiple files and folders, must be treated as one big database in that case. All of them must be included in Raft Snapshots, because they all share the same Raft Logs. This also means, that it would be impossible to apply only a single backup to one of the DB's while leaving the others unaffacted. You can then either apply a backup to all DBs at once or to none at all.

That's just how Raft works by design. To make it super short and leave out a lot of important stuff to keep it simple:

The problem is, that you can't write Logs until end of time, you need to purge and clean up at some point. To achieve this, Raft will create snapshots of the data together with the information about what the latest applied Raft log ID is. When all nodes (!) have created this snapshot, the lowest common latest applied log ID will be the one up until all members can clean up their Logs storage.
But, if multiple State Machines share the same Raft Logs storage (and therefore same network, which is created at the same time), they must all be included in these snapshots. This means they would not be treated as individual, but as one big database.

I considered something like this just in case you end up with a performance bottleneck and you are limited by disk I/O, but then quickly forgot about it because it makes every part more complex and error prone. Additionally, while the Raft is creating a snapshot, you will be able to read from the State Machine, but writes will be paused until it finished (which is usually very quick).

I did not have the time to write more docs about tuning and so on, but you can control the amount of snapshots being created via NodeConfig.raft_config.max_in_snapshot_log_to_keep. There is a shortcut using NodeConfig::default_raft_config(10_000) which accepts this number as the only argument. The default value is 10_000 right now, meaning every 10.000 logs, Hiqlite will trigger a new snapshot creation and purge logs up until this point.

So I guess this makes the whole idea of sharing all "customer" databases a really bad idea anyway. It's just how Raft works.


Is there a way, provided out of box to know if the automatic backup in S3 succeeded or not (..)

There is no alerting or metrics yet, no. The upload process will run in a separate async task on the side. It will retry a few times but give up if it fails multiple times. You would see it in the logs, but not in any metrics so far. This needs to be added in the future.

I'm expecting it will automatically download the latest backup and restore it. But nothing is restored from S3, and new backups are pushed to S3.

That's not how it works. It would not even be able to detect this automatically, because, how should it know that you deleted something and it should pull a backup? If you have for instance a 3 node cluster, kill one of them, delete the whole volume and restart it, then it will fetch all missing data from the other running nodes. That's exactly what the above mentioned snapshots are for.
S3 backups are really for disaster recovery only. If just a volume is corrupted, it will not even bother reaching out to S3 and sync from the current leader node.

Currently, there don’t seem to be any metrics emitted. How can we know if the cluster/server is working properly or is this supposed to be done at application level?

You can use hiqlite::Client::is_healthy_db() for a programmatic information. You can also use hiqlite::Client::metrics_db() to get full Raft metrics. The same functions exist with *_cache for the 2nd cache Raft network, if this feature is enabled.

Or you can use /health endpoints on both servers that Hiqlite starts to make sure everything is healthy, just like shown in the Kubernetes StatefulSet example definition for the livenessProbe.

gembin commented 7 hours ago

Thanks for the detailed explanation! Yeah, share same raft group will be a problem. It will need multi-raft group support which is not available in openraft I guess, but that makes the system even more complex.

If you have for instance a 3 node cluster, kill one of them, delete the whole volume and restart it, then it will fetch all missing data from the other running nodes.

This makes sense. Could you please expand the backup section to include more details on how to recover if the cluster loses quorum or if the entire cluster is lost?