superduper-io / superduper

Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, training and vector search.
https://superduper.io
Apache License 2.0
4.66k stars 449 forks source link

Typed FastAPI server #605

Closed rec closed 10 months ago

rec commented 1 year ago

Why

Background

Users can interact with SuperDuperDB using our REST server, which also has a Python client, which we hope to port to other languages.

The REST server sends and receives queries corresponding to JSONized Queries from our code - for example, the dozen classes starting here.

Large objects like the results of computations are returned as URIs and then the client can get them with a separate download.

History

Today our REST server is a Flask server whose interactions are defined informally by code...

...but what we wanted was a FastAPI server where each endpoint was rigourously typed!

What this means is that each endpoint corresponds directly to a single strongly typed function in our code, and that FastAPI uses that typing to declare the endpoint to the external world using OpenAPI, and to do the serialization, deserialization, and most importantly, validation, automatically!

We started with this typed server but shelved it because our custom serialization played badly with pydantic 1.x, and pydantic is necessary for FastAPI. Now pydantic 2.x is the release version, it's time to revisit that decision.

What do we get?

The feature advantages for the user from the FastAPI server are strong.

Their promo page is here: here's how it would improve our project specifically.

1. Much better data validation, error handling and reporting

Our server code currently expects all the data we receive to be in the right form and of the right type. We don't check for this, partly because we have no way even to report it to the user, so our only error message is to crash and give a completely uninformative 5xx error.

FastAPI does all that type and structure checking for us behind the scenes and gives very complete if verbose error messages. More, if we have more specific conditions, we can add our own verifiers and custom error messages.

Life would be easier if it were, but SuperDuperDB is not a shopping site. 🤣

We have many types of queries we are sending back and forth; new ones are added all the time, and old ones are changing.

I do not believe we can maintain a correct and reliable server that speaks to multiple clients in different languages even in the medium-term without some sort of strong typing in our client/server code at least very similar to what FastAPI offers.

2. Automatic generation of an OpenAPI spec

"The OpenAPI Specification is a specification language for HTTP APIs that provides a standardized means to define your API to others."

This is not just a check off box for corporate needs. We have every interest in writing clients for multiple languages, especially JavaScript. Having an objective specifications that we can automatically verify implementations in other languages against would save us a great deal of time.

3. All the API documentation lives in the code

This means that as you change the code, you automatically change the API documentation (and OpenAPI spec). The further the code is from the documentation, the greater likelihood that the documentation will be wrong.

You can also specify things like "example data sets" right at the definition of the function or parameter or type.

4. "Interactive API documentation and exploration web user interfaces."

FastAPI automatically provides two complementary flavours of endpoint that allow both internal and external developers to explore the API and experiment by filling out some forms and clicking, or by using the example data sets:

https://fastapi.tiangolo.com/features/#automatic-docs

5. Helps expanding to other language clients (JS/Node/C#/Lisp/Cobol/etc)

I really discussed it in 2. but it deserves its own section.

(I was joking about Cobol, but it strikes me that being able to suck in all those Cobol databases from the dawn of time might have some value. No, I have never programmed in it. 😛 )

Risks

Pydantic is the big risk.

Pydantic 1 ended up failing for us before. Seeing how simply Pydantic 2 seems to handle what we wanted makes it quite possible that other people had the same issue we did.

Unfortunately, some less-than-optimal decisions were taken in Pydantic 2, like the model_ issue referenced in #599. The package has seen fast change recently, from 2.0 to 2.0.2 and then 2.1 with slightly different behavior.

There are a lot of users for pydantic and the team is responsive. We can work around the model_ issue without horrible pain (by turning off a warning and not overwriting their field names). I expect some fairly fast fix to this in the next few months, I have a suggestion or two if there is no progress on that issue. 😸

Still, we should be careful.

How

The project would be in phases with increasingly formal deliverables so we can stop and evaluate.

  1. The tiny pre- phase would involve fixing some issues in our code to allow us to use the necessarily dependencies, which we need to do anyway.

  2. The short proof-of-concept phase would involve creating a tiny one-file example of the serialization problem we face, using very simplified copies of a few of our real data structures, and showing that the new pydantic release would definitively fix it and be easy to use for developers. This is our first chance to fail early and save a lot of time.

  3. The short fake demo phase involves using the simplified copies to send one sort of thing back and forth to a toy FastAPI server. This is our second chance to fail early.

  4. The short stop/go phase is where we look at the results and see whether it's worth continuing.

  5. The long real demo phase involves replicating one actual endpoint or request, using "much the same" JSON as in the existing Flask server. This will also be intended to be a pattern for writing other endpoints/request types. This is our last conceivable chance to fail, which is why we do our homework in the previous steps.

  6. The short client phase creates a new client in Python that handles that one endpoint.

  7. The long port everything phase ports the remaining endpoints to both client and server. If the previous step went well, we can probably use multiple developers on that in parallel.

  8. The medium documentation phase looks at the outputs - the OpenAPI spec, the docs and redoc endpoints - and then goes back to the code to add more documentation, data examples, and that sort of thing. We can definitely do this in parallel, with each developer on one type or endpoint.

What

rec commented 1 year ago

The proof-of-concept

Now, let's take a look at the "proof of concept" part by making very tiny classes that look like our existing classes.

The idea is to create a class that isn't allowed to be directly jsonized - it has to be stored in a cache somewhere and then retrieved - and demonstrate a successful round trip through JSON of some object containing that unjsonizable class.

We also need to make sure we can send derived classes/handle inheritance, or equivalently, that our type_id mechanism works

class Encodable:  
    # A "big blob" that can't be jsonized
    # It must be cached and replaced by a URI when sent to the client
    pass

class Document:
    content: t.Dict[str, Encodable]  # Lots of blobs

class Replace:
    update: Document

class Like(Replace): # This is one query
    like: Document

class ReplaceOne(Replace): # This is the other query
    pass

class Blob1(Encodable):
    def __init__(self, b: bytes):
        self.b = b

class Blob2(Encodable):
    def __init__(self, s: str):
        self.s = s
rec commented 1 year ago

Where does the Object Cache go?

There is an object cache which keeps the binary objects between calls to the REST server. Where does it meet the JSONization process?

Look at the Document above. How does it find out about the object cache during JSONization, and de-JSONization?

There seem to be just three choices.

  1. a single global object cache
  2. each each individual object knows about an object cache
  3. the object cache is injected after responses are sent and before requests are received

Choice 1 is the easiest. Global singletons have many known traps, though, and are hard to test. But it's easiest.

Choice 2 isn't really worth discussing, bulks out all the code and breaks various software "laws".

Choice 3 means each endpoint is decorated by the cache before it's passed to the server. That decorator is fairly advanced, but straight-forward and not too much code. We experimented before and know that FastAPI does work perfectly well with such decorators.

A mixed strategy: 1 then 3

A plan might be to first implement the proof-of-concept with a global singleton cache, because we can then see if all the other parts work.

Only if that works should we then replace the singleton with a decorator.

We need a server to test the decorator in 3!

There is a wrinkle that we can't really test that decorator without also creating a tiny FastAPI server - we need to make sure that the decorated function really does completely "fool" the FastAPI server and we get the right endpoints, validation, documentation etc., but we already wrote a FastAPI server perfectly successfully, we can just dredge that code up.

blythed commented 1 year ago

Am I right in thinking that 3. will give us an easy way to preserve thread-safeness? What would the decorator do exactly? Would it essentially do something similar to our current approach to serialization, but instead of saving blobs to the artifact_store, would "send" them to the server? If so, then we could potentially reuse some of the same code to do this.