khonsulabs / bonsaidb

A developer-friendly document database that grows with you, written in Rust
https://bonsaidb.io/
Apache License 2.0
1.02k stars 37 forks source link

List of imperfections #296

Closed phantie closed 1 year ago

phantie commented 1 year ago

I've found some issues that I consider worthy to address in it's source. I provide a list of issues and solutions to them that I used in the consumer's code.

No timeouts on API calls from Client

The calls do not time out.

Solution: wrap in timeouts. I'm using tokio::timeout. But I don't know how a problem would be solved for a blocking client more or less gracefully, without modifying the library's source code.

Client does not try to reconnect if the initial connection is lost

Solution: put timeouts on API calls - on timeout try to connect to the database; drop the old client on success. Code for connection also needs timeouts, because it's used for authentication and assuming identity, which are API calls.

When Client is idle for a long time it becomes unresponsive

A "long time" is vaguely specified, but experimentally 2 or more hours would likely do. 10 minutes is too little.

Solution: ping the server from time to time, experimentally 5 minutes works impeccably, but there's a room for an even larger delay.

Server started via CustomServer::listen_on offered with invalid Certificate returns the error:

Error: error from core a transport error occurred: 'Error completing connection with peer: aborted by peer: the application or application protocol caused the connection to be closed during the handshake'

Caused by:
    a transport error occurred: 'Error completing connection with peer: aborted by peer: the application or application protocol caused the connection to be closed during the handshake'

The documentation for listen_to:

/// Listens for incoming client connections. Does not return until the
/// server shuts down.
pub async fn listen_on(&self, port: u16) -> Result<(), Error> {}

Seems strange that a possibly malicious client can crash a server so easily. \ \ Solution: do not server.listen(port).await? as shown in the examples, but handle or ignore the Error::Transport and loop { server.listen(port).await }

Conclusion

After solving these problems the client becomes responsive and self-healing, and database does not crash so easily.

ecton commented 1 year ago

Thank you for all of this feedback. I'm going to split these issues into their own issues to track them independently. Can you verify what version you were evaluating? I believe "Client does not try to reconnect if the initial connection is lost" is fixed on the main branch as of about a month ago.

phantie commented 1 year ago

Sure, split these into the separate issues, I just did not want to flood the issues, doubting if they are all viable. I'm currently using v0.4.1. Good that will be done in the next release. I wish to contribute myself, just after I ship the project I'm working on.

ecton commented 1 year ago

No timeouts on API calls from Client

This is a very good observation. I think we can offer a configuration on the Client builder to allow setting timeout options, which can be respected by the blocking and non-blocking code under the hood.

We can probably persist these timeout settings on a per-client basis, so that each clone of the Client could be customized with its own timeout settings.

Client does not try to reconnect if the initial connection is lost

As I mentioned in a previous reply, I suspect that if you tried this on the main branch, this should be resolved. I did some significant refactors after running into this issue as well. There is actually even a unit test to ensure the client successfully reconnects.

However, this is only partially solved in that it will only reconnect on timeout once the underlying protocol has noticed a timeout. Once we implement the timeout system, this particular issue should be completely resolved.

When Client is idle for a long time it becomes unresponsive

This request should be covered by #116, which isn't currently attached to a milestone, but it's important to projects I'm working on, so it will be tackled soon.

Server started via CustomServer::listen_on offered with invalid Certificate returns the error

Thank you for reporting this! This was simply an overlooked ? that bubbled up an error that should have been logged instead. I've fixed this for the next release.

I'm going to write up my thoughts on timeouts in a new issue today/tomorrow. I would be thrilled if you find the time to contribute someday!

ecton commented 1 year ago

I've created #297 to track addressing the timeout issues. Thank you again for reporting all of these!