sqlitebrowser / dbhub.io

A "Cloud" for SQLite databases. Collaborative development for your data. 😊
https://dbhub.io
GNU Affero General Public License v3.0
363 stars 38 forks source link

Proposal: API for databases #52

Open karim opened 7 years ago

karim commented 7 years ago

Along with providing a database hosting service, DBHub provides an API for managing your hosted database. Combined with Serverless services (e.g. AWS Lambda, Google Cloud Functions, Microsoft Azure Functions) users are able to create beautiful web applications/services by using the DBHub API.

While this Serverless computing model frees you from having to manage a server, it does not free you from writing, debugging, testing, and running code. Not to mention the disappointment of finding out that your favorite programming language is not supported, or severely restricted, by your service provider.

DBHub could provide a solution for that.

Why?

Most web applications/services that expose an API are just wrappers for a database. They do so by running a server that waits for certain URLs (endpoints) to be accessed and then query a database and return the result.

The process is simple, yet it requires you to write a lot of boilerplate code just to get your API running. And you have to change your code if you want to add/remove new endpoints or if you made any changes to the database.

DBHub could simplify this by handling all that for you.

How?

For every database hosted on DBHub, a new option/dialog to Create an API or Create an Endpoint should be available. In it, the user will edit two things:

  1. An endpoint (e.g. /endpoint)
  2. A query to be executed when that endpoint is accessed (e.g. "SELECT * FROM my_table")

DBHub then, by running a service/server at api.dbhub.io exposes each user endpoint at api.dbhub.io/user so /endpoint URL for the user karim will be accessed at api.dbhub.io/karim/endpoint.

You can create unlimited number of endpoints and access them within minutes of creating them and without the need to write a single line of code.

Implementation

As a proof of concept, I have created a simple server that provides this functionality to DBHub. The server is located here (https://github.com/karim/randa) and can be tested and used as a standalone app. Just build it and start it and you will have your own API without writing any code at all.

The only thing you need is to create a YAML configuration file to set your database file and your API endpoints.

The file should look like this:

# The SQLite database file
database: database_file.sqlite

# The list of endpoints that the server should handle
endpoints:
  - url: GET /endpoint
    query: SELECT * FROM `table_name`
  - url: GET /endpoint/with/{param}
    query: SELECT * FROM `table_name` WHERE `column_name` = {param}

When you try to access the endpoints, and depends on your query, you will get something like this:

{
    "column1": "value",
    "column2": "value",
    "column3": "value"
}

Note 1: The column names and values will be returned as-is without any modifications. If you have a ColumnName and you prefer it to be returned as column_name, just do it in the SQL query.

Something like: SELECT ColumnName AS column_name FROM ...

Note 2: NULL values are not returned in the result. This allows you to distinguish between a column that has a value and this value is an empty string "" (or 0 for integers) and between a column with no value.

So, just start the server and your API is ready to use. The code to create routing for urls, opening & querying the database, returning the result, and converting it to JSON is already done for you.

A real-world example should make this even more clear.

Example:

For this example I will be using Belfast Bikes Docking Stations database.

Let's say you are creating a Mobile/Web app that shows the locations of Belfast Bikes docking stations.

The app will have something like a MapView, when started it will show a map with all stations as markers/points, and you will be able to click any of them to get more information about each station.

The first thing to do is to upload this database to DBHub. Then you have to create a few endpoints to make it expose a public API that can be used by (mobile/web) clients.

The first endpoint will be /stations. This one will just query all the stations locations (lat/long only) since that's what we need to display the markers on the map. The second endpoint will be /dock/{id}, and this one will be used when we click on one of the markers.

The config file should be something like this:

database: Belfast+Bikes+Docking+Stations.sqlite

endpoints:
  - url: GET /stations
    query: SELECT Dock AS id, Latitude AS lat, Longitude AS long FROM `stations`

  - url: GET /dock/{id}
    query: SELECT * FROM `stations` where `Dock` = {id}

When you open the app, it should make a call to your /stations endpoint, which should return something like:

...
    {
        "id": 3903,
        "lat": 54.598224,
        "long": -5.924024
    },
    {
        "id": 3904,
        "lat": 54.601309,
        "long": -5.92187
    },
...

After checking the map and finding the docking station you are interested in, you click on it, and this will make the app access the second endpoint /dock/3903, and the returned JSON object will be:

{
    "Dock": 3903,
    "Name": "Victoria Square / Victoria Street",
    "Latitude": 54.598224,
    "Longitude": -5.924024,
    "StationSize": 10
}

That's all!

Conclusion

DBHub lets you focus on your application logic and it takes care of your database/API hosting needs.

However, it is still in its early development stage. This issue is opened to get feedback about the best way to integrate this into DBhub. Your feedback is welcome. Please do share your ideas and thoughts.

justinclift commented 7 years ago

Yay, good write up. I'll go through this in more detail after I get some sleep. It'd probably be useful for me to write down the concepts for the new backend data model anyway, as you'll need to know them in order to pull data from the backend. :smile:

justinclift commented 7 years ago

As an extra thought, would it be useful for people to be able to control access to the API's they define? eg by specifying some kind of API key, or maybe IP address white list or "something"

karim commented 7 years ago

Yes. This is also in my mind, and I think it should be the default. I don't know how to best implement this though, so I left it out for now.

I was thinking we could add an entry in the YAML file with every URL that define the access control/allowed IPs. This entry points to a text/sqlite file (hosted in the user directory) that has the required info.

Or maybe, we let the user define custom tables (like users or access, or dbhub) inside his own database if he wants to have more control. This is not a clean way though as it requires editing their own database which they might not prefer.

karim commented 7 years ago

In Go terms, this could be implemented as HTTP middleware. Before each API request is served, the middleware will check if it is allowed. If not, it will kill the connection, or return an error. If it is allowed, it will just pass it on to the handler to run the query.

justinclift commented 7 years ago

Yeah, the middleware approach sounds like it'd be fairly easy. As you mention, we'll have to figure out a useful + easy-to-work-with approach for people being able to set the access they want.

... but probably do it later on, after the basic foundation bits are working. :smile:

chrisjlocke commented 7 years ago

Cripes, this looks interesting. I can see so many uses for this at work alone - knocking up web-enabled applications is usually a pain, but even Android apps (well, OS independant apps) would be easy to create using this.

justinclift commented 7 years ago

@karim With this:

I was thinking we could add an entry in the YAML file with every URL that define the access control/allowed IPs. This entry points to a text/sqlite file (hosted in the user directory) that has the required info.

From an implementation point of view, if you can get the required info into a json structure (instead of YAML), that'd be optimal.

Saying that because with the new backend data model (which I'll document in the wiki some time today or over the weekend), we're storing all of the other per-database metadata (eg branch info, commit history, etc) as json in PostgreSQL. It's fairly efficient to pull it from the database too, 1 query for the lot in it's present structure. 😁

karim commented 7 years ago

Yes, it's doable. Just add JSON struct tags next to YAML tags, and unmarshal it.

There's nothing special about YAML, it is just much readable than JSON for config files. I also thought about doing the config files in TOML.

It really depends on how you are going to integrate this with DBHub.

I was thinking that you should keep DBHub Web/API server (at www.dbhub.io) intact, and run this as another server (at api.dbhub.io maybe) next to it. This way you can turn it on/off and update it without affecting DBHub operations. After all, it is an optional thing, and most importantly, it will be running user-generated SQL queries on-the-fly without re-compilation, and people do make mistakes. :)

Are you thinking of merging its code into DBHub code so you can run one fully-featured server, or you will run both independently? Both are fine with me.

If you want to run one server, then I guess we should change it to a library instead of an app. This way, you can import it into DBHub code directly. After turning it into a library, I could create a command-line for it so people could use it also (to spin-off a quick web app or API prototyping) without DBHub as a standalone app.

karim commented 7 years ago

@chrisjlocke I assume you don't have Go installed and cannot build and test this. I have uploaded the executable here so you can test it and give us feedback if you got some time.

chrisjlocke commented 7 years ago

No, I don't have Go, but have enough spare server space to set up a VM to have a play. Would be interesting to dabble. Up to my ears in VB stuff at the moment though, but is something I want to play with..... if only to create webby programs for myself.

justinclift commented 7 years ago

@karim Yep, very much agree this is the kind of thing which would work well as api.dbhub.io.

I wrote a quick and dirty api server a few weeks ago as the backend for a new DBHub.io cli (for testing the new data model). I'll likely update that next week and chuck it online at dio.dbhub.io, purely so people have a cli tool for doing repo admin tasks with. eg uploading/downloading databases from the command line, creating branches, etc.

Yours is a different thing than that, and I think it'll be more widely useful and will deserve the main api.dbhub.io domain. :smile:

justinclift commented 7 years ago

@karim With the "one server vs many" thing... so far it's several different servers/programs. Each in their own subdirectory under this sqlitebrowser/dbhub.io repo.

Note - I started out with them in individual repos, but it became a bit of a pain keeping things coordinated between them. So I merged them into one repo, and it's easier now.

Having them as individual things seems to make sense (to me). It lets us concentrate on each thing being good at what it does. And we'll likely have dedicated servers for each when we deploy this stuff in the real world. :smile:

justinclift commented 7 years ago

@chrisjlocke Get you some IDE: https://www.jetbrains.com/go/download/ (no registration needed) 😁

justinclift commented 7 years ago

@karim Speaking with @adewes a few minutes ago. He was saying they'd be interested in using such an API in the product they're developing. So, we probably have a first volunteer for testing things once it's hooked up to the DBHub.io backend. 👯

MKleusberg commented 7 years ago

@justinclift After a first quick look it seems like we can change DB4S to use the dio API. We'd need to port the directory listing code to the dio server first, however. It doesn't really make sense to have this for a CLI tool but that's no reason to not have it in the server bit, right? :smiley:

justinclift commented 7 years ago

Yep, pretty much. The dio server code at the moment is single user. I needed something to test a forking/branching/etc model with, and didn't need it to be multi user.

I've almost gotten the new data model integrated with the existing webUI code, so the website will work "as is" for most things but have a better base to grow from. Aiming for today/tomorrow, but we'll see.

I'll need to go back through the code after that and harden it (eg ensure 100% coverage of input validation) before updating the dev1 server with it. So probably a day or two more again.

Once that's done I'll probably take a few days break, as I need to get some other non DB4S/DBHub.io things done IRL. When I'm back I'll update the dio server code to use our multi-user backend, which DB4S could then talk to.

Reasonable time line?

MKleusberg commented 7 years ago

Yep, sounds good :+1: I'll probably go through some DB4S issues in the meantime and leave the dbhub code as-is until the new API is in place, then update the existing code to use the new API, and finally start adding more features to the dbhub integration code :smiley:

karim commented 7 years ago

@justinclift I thought you have one server process that is doing all the work at https://dbhub.io/. But if it's a multi servers/programs then we could move this to it.

Just so I don't get confused when we say API, are we talking about DBHub (dio) API or Randa (user-generated) API? Or you want to simplify things by merging them and have just one API?

DBHub API is used for uploading, downloading, versioning, and managing the whole user account and his databases. It can also be used with CLI tools and for integration with DB4S. And if you want to use create a Serverless app, this is the API you should use.

Randa does not really offer an API. 😄 It is just a server that creates API while it is running. So it is best used by people who don't want to be bothered by writing code, mainly DBHub users since they just want to upload databases to it and maybe they want to be able to access parts of their database using an API.

Originally, I wanted to call it RESTful API with Nothing but a Database Access, because that's what it is. Oh, I like acronyms. 😄 But then I settled for Reliable instead of RESTful since this could extend to other things such as GraphQL (or its predecessor FQL), YQL, and many more. Hopefully when we get it done it will become reliable.

justinclift commented 7 years ago

Ahhhh... I'll attempt to explain things a bit better then. :smile:

With the website at https://dev1.dbhub.io, that's running the "webui" code from here:

    https://github.com/sqlitebrowser/dbhub.io/tree/master/webui

It's also running (on a different port), a REST server for DB4S to communicate with:

    https://github.com/sqlitebrowser/dbhub.io/tree/master/db4s

They're separate executables, launched manually by me from the command line as needed. Obviously they'll be updated to run from proper system startup files for a real deployment scenario, but this is good enough for our current dev1 server. :smile:

With dio, I'm planning on moving that to a "dio" subdirectory in the main dbhub.io repo. Right next to the above "webui" and "db4s" ones. Dio is a user facing cli, so needs an API running on a server somewhere to communicate with. That'd make for a 3rd manually launched server process. :wink:

I was misunderstanding ranga, thinking it was an API server too (not only the generator part). So, we'd use ranga on our backend to generate API's for people?

Should we try using ranga to generate the dio API instead of my hand coding it?

That might make for a good "trial by fire" to get Ranga up and running? 😁

karim commented 7 years ago

Okay, now I understand how things work at https://dev1.dbhub.io/.

Yes, on its own it does not provide any API, it just serve users' API when they create one. It will just run and wait for users to add an endpoint, via the WebUI maybe. Once a new endpoint (say /hello for example) is added from the user karim, it will be accessible on api.dbhub.io/karim/hello. When someone tries to get that URL, it will run the query (already defined with this endpoint) to fetch the data (from the databases hosted on the user karim account) and return the result.

When I said an API server, I meant that it should run on a domain like api.dbhub.io domain and serve users API, via api.dbhub.io/user-name for each user, or user-name.api.dbhub.io, or both. Maybe we can also provide them with custom domain names, same as GitHub Pages, so they can serve their API (hosted on DBHub) using their own domain names. You might want to write that down in the ToDo list if you want to add it. 😄

Yes, you can use it to generate the dio API. I don't know how complicated it is, but give it a try. 👍

While it is just a prototype, and the source is just a couple of files, it is somehow working. You can set the dio API endpoints, along with their queries and it will return to you the result as JSON.

In fact, I like this approach as it will help us do two things at the same time. If we done it that way, we will save some time by not writing any code for the dio API and waste that time on randa instead. And at the same time we will be testing both and we can finish them quickly.

justinclift commented 7 years ago

Excellent, sounds like the beginnings of a plan then. :smile:

I'll go through the ranga code after I'm back from taking a few days off to do non-DB4S/DBHub.io stuff, and get a proper understanding of it, then I'll start figuring out how to create a suitable API end point for dio. And probably start asking tonnes of questions about the same time... :wink:

karim commented 7 years ago

Great! 😄

Take your time and enjoy your free days.

justinclift commented 4 years ago

Following up on this (several years later!). :wink:

One of the more useful things I needed to figure out recently was how to safely execute user provided SQL statements against their databases, and return the result.

Am kind of thinking it'd probably be pretty simple to add a API in front of the execution part, so people can run their own SQL statements against their own database and we just JSON format the result and return it to them. Much of the backend code is already written for the visualisation part anyway.

So, I'll take a look at the Ranga code after we get the 3.12.0-beta1 (or release candidate1) out the door this weekend. :smile: