Contribution - Githubissues

simoneromano96 commented 3 years ago

Hello, I'd like to contribute to this Repo, where can I start? Who is managing the contributions?

mosheduminer commented 3 years ago

Hi, @simoneromano96, thanks for your interest. If you would like to contribute, you are also welcome to join the slack workspace, so it is easier to coordinate.

I'll describe the high level view of the library as I imagine it (perhaps there will be problems with lifetimes, I'm not sure, I'm rather new to rust). Please feel free to comment.

The API surface will expose three main structs, the sirix struct, which exposes methods to access server-level APIs, the database struct, which will expose database-level APIs (these rest api routes begin with /{database}), and a resource struct, which will have methods exposing resource (/{database}/{resource}) APIs. You can take a look at the python client and typescript client, both of which expose an API surface similar to the above (unfortunately, I still didn't document the typescript client 🙁).

Internally, however, it is more complicated than mapping methods to rest api routes, because the library also needs to handle authentication - not just one off authentication, but also refreshing the current token, using a refresh token. I imagine using the delay_for method from tokio for waiting until a few seconds before the token will expire, then refresh. I am currently trying to figure out how to do this in a way that it would be possible to cancel this task, however, should the program exit. I fear I am in a bit over my head on this one 🙂.

The internal structure will mirror the typescript client, I expect. That is, the Auth struct will expose a method request which will accept parameters necessary to build a request object, but the method will add the authorization header first (before passing on to request_impl, which already exists). The HttpClient struct will contain the Auth struct instance, and use its request method for requests. The HttpClient exposes a method for every api route, while the methods of the higher level structs are not limited to this mapping. The HttpClient struct needs to be referenced by the Sirix struct, and every instance of the database and resource structs`, so that they can make request through it. I'm pretty sure this will cause rust ownership problems, I figure lifetimes might solve it.

[You might notice that the python client also has a JsonStore class. Because you need to write custom queries to make use of SirixDB, I attempted to provide an interface that would enable baise operations without requiring knowledge of the JSONiq query language. If it will be implemented in this client as well, it will be done after the above].

You can help out with the auth problem, or start building up the other structs (which are currently in various states of disarray).

musitdev commented 3 years ago

Hello, I'm also interested in helping to develop the Sirix Rust client. You've already define a model that will determine how the API will be used. I'm new to Sirix so I don't know how client use it. Do you have some use case of how Sirix db are used by client? In Rust conception choice are very important and for the same need there is several choices. For example, the first task will be to define the authentication part and how it works. In your model you use a struct that manage it. Will it have to be share between thread? If yes the solution will be to make it Send / Sync, or use a pool like r2d2. You choose to provide an async API, it can be hard to use it in a multithread model. Perhaps it's interesting to have a non async version with the same API. So do you know if it's important to keep the same connection between queries or update? If not there's perhaps no need to keep the auth token and the associated refresh process. You can create an API like the builder pattern for example. You build the query and use auth data to fetch the result. An example inspired by SQLx API.

let mut result = sirixrs::query("builds the query or update")
    .bind(data) // add specific data
    .fetch(&auth_data).await?; //do the query open and close the connection.

Or you can use a connection struct that manage the auth token and give it to the build pattern to avoid to ask several token.

I don't want to beak everything but as you start the development perhaps it's a good idea to challenge several API model with the mains use cases?

mosheduminer commented 3 years ago

Hi @musitdev, the client could potentially be used by a web server, and I hope to eventually write a rust CLI using this client. I think of sirixdb as similar to mongo, where in pymongo (which is what I am familiar with), where the high level break down is similar, and they are both document stores, although they are rather different in how they are document stores. Unfortunately, there is not enough documentation on actual use of SirixDB, I hope to change that in the near future, but I will presumably be dealyed 🙁.

I conceived of this design because it already worked for me with the python and typescript clients. If, however there is another, better approach, I'll be happy to go that route. Before going further, I will say that although currently the client is async, I would like to support sync/multithreaded as well, as I did in the python client. However if you have another approach to authentication, I would be happy to hear it.

Technically, you are correct that the auth token does not need to be persisted between api calls, however, I think it would be wasteful to ask for a token every time, particularly in a web server.

Or you can use a connection struct that manage the auth token and give it to the build pattern to avoid to ask several token.

Can you elaborate on this idea, and how it would be different than the Auth struct managing the tokens?

musitdev commented 3 years ago

Ok I try to propose several API choice to manage:

get a token auth.
- use it to build a query. (update or select)
- query and fetch the result: all row or one row by one row. The idea is to keep the fetch open between server call. I try to see the python API to be near as possible.

musitdev commented 3 years ago

I started my thoughts about the Rust API. I haven't finished but I would like to show you the approach that we can have. I start with the authentication as an example.

Authentication management

Needs

External authentication using Auth2: As I see the authentication is done by a specific server like Keycloak. I think depending on the Sirixdb installation it can change an use another Auth2 process. In the python API this fact is hidden and the user can't change it.
State less or state full session The communication between the client and the database is state less (REST API). To allow state full session, a sort of heartbeat is done. For a full client application this process is ok but for a web server with state less connection it can be very hard to manage this heartbeat. I think, it's important to allow state less and state full session and to allow the user to decide.

Proposition

The auth process has its specific API that return an oauth token that is use to authenticate each Sirixdb call explicitly. We can use a Rust crate like OAuth2 or propose a trait to implements it. The trait example:

trait AuthToken {
    fn get_token(&self) -> String;
}

By default the API provide an example implementation that use KeyCloak example: let token = sirixdb::keycloak::auth(login, pass, config).await?; with

pub fn auth(login: &str, pass: &str, config: keycloak::Config) -> Result<KeyCloakToken, Error>) {..}

Struct KeyCloakToken {
    ...
}
impl AuthToken for KeyCloakToken {
    fn get_token(&self) -> String {
    ..
    }
}

For the heartbeat we can provide an optional struct that manage it for the Keycloak version like:

struct KeyCloakHeartBeat{..}
impl KeyCloakHeartBeat {
    pub fn add_token(token: &KeyCloakToken); //start the heartbeat
    pub fn remove_token(token: &KeyCloakToken); //stop the heartbeat
}

And manage all the keep alive token process inside. The heartbeat management is specific to the auth mechanism and the application. We should allow to the user to define its own.

A Async and non Async version should be proposed. I have to see the best way to do it. Perhaps you have some idea.

The token can be used to access the database like: sirixdb::create(&token, db_name).await?; and the create function pub fn create(token: & impl AuthToken, db_name: &str) -> Result<(), Error>) {..}

Some remarks

For the Sirix API, I see some pb in the python API that can be avoid or done differently in Rust. For example the create process for a database or a resource is strange because you create the object and after you create it physically. How do you know that the database or resource has been effectivly created only with the object? I think its preferable to have a function that create or open and only if everything succeed the database or resource struct is created. In python:

db = sirix.database("test-json-database", DBType.JSON)
await db.create()

Become let db = sirix::open_or_create_database(&token, "test-json-database", DBType.JSON).await?; or let db = sirix::database(&token, "test-json-database", DBType.JSON, open_mode).await?;

Another point: As I understand, the json API is a subset of the resource API with less possibilities. For example json.find_all() return all the elements that match and resource.query only the reference to the elements. My remarks are

in a server oriented API the query is important, because you can optimize the fetch
in the json API you never force to only use json message. I can send and XML message, it will crash during execution. In Rust we try to avoid this and add compilation verification that can't be done with python. If it's not possible during compilation, the client should verify before sending the data.

With Rust you can model the same API for different type of data and add specific data verification. I think it's important that the API use these facilities and we take some time to try to find the best way to provide it without adding to much complexities. Tell me if your interested by this approach so that I spend some more time to think about it.

mosheduminer commented 3 years ago

Many thanks for your thoughts! I will quickly write some notes on your suggestions.

In regards to auth, I like your proposal wherein the developer can directly manage the auth process. I preferred to make it easier for the user by making this implicit, but I suppose it is nevertheless the best solution.
The client does not need to know if keycloak or any other auth server is used, as sirixdb provides a /token endpoint to fetch a token. With your proposed approach, however, it will be easier to change the provider, as well as query keycloak directly.
In the python client, I preferred to make creation network calls methods, instead of implicit creation, for three reasons. First, it is possible for a struct to exist even after the delete method is called (perhaps in rust we can enforce cleanup of the struct after a delete network call, however), so I opted for the possibility of the struct without creation as well. The second reason was because of the way I supported both sync and async APIs would have been complicated. Although, looking back, it could probably be managed. Third, creating a resource requires initial data, and overrides the current resource. So any method to open_or_create would take data to create, but then how to decide whether to create or open? We can not rely on checking if it exists, we must decide based on whether or not data was provided, but what if data is provided and the resource exists? Maybe it was by accident that data was provided? I preferred therefore to make creation explicit.
In regards to optimizing queries, I agree that it would perhaps be preferable to have some queries optimized on the server. But to do so is to prioritize a particular way of using sirixdb, as defined in the JsonStore class. It is up to @JohannesLichtenberger to decide what to do about this.
In regards to returning XML - resources are either xml or json, never both, so you will only get a mixed response if you attempt a cross-resource query, but I only parsed JSON where (presumably) only a JSON resource is being opened. I do agree, however, that more thought needs to be given to the user to deserialize, and that we should merely provide utilities, perhaps.

musitdev commented 3 years ago

My thought about the point 3) In Rust the philosophy is different from python. We have the type system and a compiler that can check a lot of think. So we try to put a lot of information in the API to validate as much possible element at compile time. For example you say: "In the python client, I preferred to make creation network calls methods, instead of implicit creation": you're right, the API use must be clear and not error prone, I proposed create_open because I think it's the same but if it isn't the API must show it. So if the needs for a resource is:

created alone: it's a parent resource and can have only child resource. If not it mean it can be moved to another resource and a move function must be added.
create with an existing one, in this case the type of link between the parent must be specified. The API can be: let main_resource = sirix::resource::create_new(&token, ["blah", {"a key": 5}]).await?; // return an error if it exists for example. A resource created with an existing one can be for example: let child_resource = main_resource.insert({"hey": "test"}, InsertLink::CHILD); //return an error if the child already exist In this API you can't link a resource alone or insert a main resource to himself for example.

I don't know if these constrains (error if exist or no link to himself) are valid but it can be interesting to enumerate all so it can be integrated in the API. The goal is to block or notify during compilation the developer when he do something wrong.

5) "In regards to returning XML - resources are either xml or json, never both..": it can be forced in the API. For example, imagine we use the resource API above we can extend it like that:

struct XMLResource;
struct JSONResource;

struct Resource<R>{
    _r_type: R;
    pub value: String,
};

impl Resource<XMLResource> {
    fn new(value: String)-> Self {
        Resource{
            _r_type: XMLResource{};
            value,
        }
    }
}

impl Resource<JSONResource> {
    fn new(value: String)-> Self {
        Resource{
            _r_type: JSONResource{};
            value,
        }
    }
}

impl Resource<R> {
    fn insert(value: R) -> Self {
    ..
    }

    fn query(query: ...) -> R {
        ..
    }
}

//add from string et str trait impl to facilitate use

pub create_new_json(token: .., value: String) -> Resource<JSONResource> {
    value.into()
}

pub create_new_xml(token: .., value: String) -> Resource<XMLResource> {
    value.into()
}

let xml_resource = create_new_xml(..);
let query_ressource = xml_ressource.query(..);
let json_resource = create_new_json(..);
json_resource.insert(query_ressource); --> error wrong type.

It's not the final API (I even didn't try to compile it), this is an example how to use the type system to detect bad API use at compile time. In Rust it's a common practice and it's important to provide it in the API design. That why I think it's important to define what it's possible and what it's not and to try to put it in the API.

mosheduminer commented 3 years ago

The general thrust of your comment is to make use of the type system. For example, while in python I used an attribute to separate the resource types, and parsed based on that, in rust we can use generics at compile time. I like this idea a lot.

If you have further ideas, please comment further! And of course, if you implement any of this, I will be happy review a PR 😉.

musitdev commented 3 years ago

I update the code example with the correct pattern. I'm ok to propose some API. What I need are use cases and mostly how database and resource are use. The use case for resource are important because its the heart of the API use. I think that query detail use case can be defined after. Before developing the API we should work on the common API like error, log and how we manage async and non async. There a good recent blog post about error: https://matklad.github.io/2020/10/15/study-of-std-io-error.html We should take example on it to develop the error API. Perhaps you can open a PR to develop the error API. For logging, Rust logging macro are fine for most use in a lib. Some debug and trace entry are interesting when debugging. In Rust it's hard to debug by stopping the execution and inspect variable so we mostly use trace. For the async mix I don't have much experience but there is some crate that have already done it like: orientdb-rs, Gremlin-rs, sqlx, mongodb. Meanwhile we can define the use case and work on the API design. Tell me what you think about this plan and @simoneromano96 if you're still interested or you prefer to contribute in another way. I don't want to give the impression that I want to drive the process. I only propose using my experience and I'm open to do as you prefer. I'm interested on the project because it's Rust, I want to learn more about the async part of Rust and I found the Sirixdb concept interesting.

mosheduminer commented 3 years ago

It all sounds good.

What I need are use cases and mostly how database and resource are use.

Well, they are used for their methods, which are abstractions over HTTP calls to the database. Every call does something in particular. Some parts of the api are likely to be mostly unused by anything other than the frontend GUI (which uses the typescript client library), such as reading a resource based on a nodekey, for example, it is far more likely to execute a query, since you presumably won't know the nodekey in advance (this is actually pretty useful for the GUI, however).

So, reading and writing to resources will mostly be done by using XQuery/JSONiq. In the python client, I added methods that execute various simple and common queries (assuming you are using a SirixDB JSON resource as an array of objects, however, nothing requires you to use SirixDB in this way in particular). These queries are the most common use case, but I have simply been treating queries as strings, so far.

I plan on reading the blog post about the error type soon. I also would like to look at how mongodb and the other crates handle mixed sync/async and tokio/async-std support.

AlvinKuruvilla commented 3 years ago

I think it would be interesting to start developing a CLI for the sirix client in rust, even if it is barebones, and I would be interested in exploring options. Do you have any suggestions for the design and where to start with that @mosheduminer?

mosheduminer commented 3 years ago

@AlvinKuruvilla I actually started work on this recently, and have a few commands working already, in a private repo. I'll just made it public at https://github.com/sirixdb/sirixsh. If you're interested in helping, we can discuss there, or in slack.

sirixdb / sirix-rust-client

Contribution #2

Authentication management

Needs

Proposition

Some remarks