Open pontikos opened 4 years ago
Defining a schema for the JSON and switching to protobuf are two different, almost orthogonal things. JSON vs. Protobuf is about the transport, not the schema, and using a good API framework rather than a generic, low level HTTP server such as flask, you can define and document the schema returned by each endpoint and allow the client to choose whether it wants JSON, protobuf, XML or whatever, using the Accept
header to serve the client the format they prefer.
You can use Django Rest Framework or FastAPI to define the endpoint types in a stricter way and have OpenAPI (AKA Swagger) generating the docs automatically. I have used them both and I warmly suggest FastAPI: DRF is way clunkier to use, it carries all the Django baggage it doesn't use; FastAPI is more modern: it allows using async code and uses strict types to both define interfaces in a way much simpler than DRF, and raises error if you even attempt to create the wrong data in python, before even hitting the JSON (or protobuf) serializer.
In my view, after taking a look at your python code, you should ditch Flask as soon as you can: it's technical debt that stops you to enforce any API contract.
@dvarrazzo we had started using swagger here https://github.com/phenopolis/phenopolis_api_schema a while ago.
However we found it was complicating things at the time as we hadn't finalised the data model yet. @logust79 had started working on it at the time.
Now we have a better idea of what the JSON protocol between frontend and backend should look like. Ideally would like something that both frontend and backend can import and use and that is validated at both ends, Python and js end? Do you think FastAPI would be a good solution to this that plays well with the js frontend?
ps Btw we don't want to complicate things with other formats for now (XML et). JSON works fine and is what the frontend expects.
Just moving @priesgo comments from slack to here:
I was just discussing with @Nikolas in relation to @Yuan Tian's point about the dynamic typing of the data the API is returning. I would like to open a discussion on the best way to solve this. My first intuition is towards a contract between backend and frontend, a model that determines how data is going to be structured, types, etc. There are multiple solutions for this: JSON schema, avro, protocol buffers, thrift (any others?). I am inclined towards protocol buffers which is developed by Google and it is what they use to connect their systems. Basically, you can define a model from where you can generate source code in Python, Javascript and many others. Those objects will be the data containers that will allow you to check the validity of the data + they can provide a strongly typed environment to program. Also, it provides multiple bonuses: (1) model documentation can be generated and shared with API users internal or external, (2) it will allow us in the future to do streaming through gRPC on the bottle neck endpoints such as fetching variants, the frontend could start populating the table as soon as the first results came in instead of having to wait for the zillion variants or paginate through the endpoint.
NOTE: native Python code generation in protocol buffers is not fantastic, this library does a much better job https://github.com/danielgtaylor/python-betterproto
@pontikos in FastAPI the schema is generated automatically from the types you use in Python. I have seen people spending ridiculoud amount of time in swagger, by having the specs separated from the real code, as you seem you have done so far, and even with DRF I have seen people struggling.
In my experience with FastAPI (which is only about one month long), generating swagger docs from it comes very cheap, just directly from the objects you use in Python to model your domain and from the arguments the endpoint functions handle.
An example, unfortunately I can't share this repos because it's a private project. In this system there is an object called File, and in order to create one you can do a POST with a subset of the file arguments, because e.g. you don't choose the id and the ownership of the file is decided automatically, can't be chosen by the caller. All it took was the definition of the model:
class File(BaseModel):
id: UUID
groupId: UUID
changedAt: datetime
deletedAt: Optional[datetime]
fingerprint: str
stars: float = 0.0
numComments: int = 0
tags: List[UUID] = []
extra: Dict[str, Any] = {}
while the input for the request is:
class FileNew(BaseModel):
fingerprint: str
tags: List[UUID] = []
extra: Dict[str, Any] = {}
changedAt: datetime = Field(default_factory=now)
BaseModel
is a Pydantic model, so if you try to create an instance of this class in Python and pass an int to fingerprint
it will throw an exception.
The post handler is:
@router.post("/", response_model=File, status_code=201)
async def create_file(data: FileNew, user: User = Depends(req_user)) -> File:
return await _upsert_file(None, user, uuid4(), data)
The documentation generated from this endpoint looks like this for the request:
and this for the response:
I quite like FastAPI - just had a quick play about - it seems a lot more modern and very relevant for our use case.
And really love the type checking + automatic Swagger API docs...
Looking online, one of the two following projects seems like the perfect starting point for us:
(Note the frontend above, is not the website frontend, but rather the frontend for the API admin)... (in production - we should probably use Amazon RDS and not the postgresSQL in this docker-compose)...
About deployment:
Moving to gRPC is secondary at this stage in my opinion, I would stick to json if not forever for a long while. The killer feature from protobuf for me is probably the simplest, decoupling the data modelling from any language making it easier to collaborate with domain experts and reason about the data. Also it seems that cohesion between systems is stronger due to the generated source code to hold the data on different languages. So that means in the server and the client. Although you could discuss that.
I don't think the generated python classes from betterproto are pydantic (I love this name!) but they can be validated explicitly for type checking.
Does it make sense to you to combine fastapi, protobuf and openapi? What would be the best phased approach to do that in incremental steps?
Also, if we consider moving away from flask, then why not going to something faster like go? It is faster than python, that may be important once we have maximised database performance.
Disclaimer : no experience with go and no stakes at Google!
Totally agree with trying to stay away from django rest framework.
From https://fastapi.tiangolo.com/features/
No brainfuck:
No new schema definition micro-language to learn.
If you know Python types you know how to use Pydantic.
🤣🤣🤣
I guess I am precisely asking for a brain fuck 😅
As a side note it would very convenient to use this pydantic on the betterproto library.
:)
So what's the verdict? Eventually move to fastapi? Who is going to lead on this?
Priority is first to get the new db sorted out though.
I would favour a phased approach as follows: 1) Set up a contract using protobuf. This would imply among other things moving the language support to the frontend, so they payloads do not change with the language only the presentation. 2) Change the backend to a different implementation maintaining the contract.
This is a more conservative approach with less chances of breaking the frontend. Also I want to highlight the importance of keeping separated the database model (what is stored in the db) from the data model (what is shared with clients).
Quoting the Marx brothers: those are my principles, and if you don't like them... well, I have others
Guys I don't see you very excited about protobuf. I am happy to move on to fastapi without any prior step, I don't want to kill your energy or impose anything at all.
How should we proceed? Someone taking care of prototyping? Any particular measure to try to ensure that the contract is not broken while migrating to fastapi or on the other hand you want to do something from scratch without worrying about the legacy and putting in place the new database model?
Should we brainstorm some requirements on a google docs?
hi @priesgo i am perfectly ok with whichever solution we go for. I think the winner of this discussion is whoever actually goes ahead and gets it done :).
For me what would make sense is to have one definition of the JSON template/schema that both frontend and backend can reference. Therefore we only need to change the definitions in one place, in theory. Also where are these definitions stored? I think ideally we want them under version control in git and accessible at a URL?
The kind of changes I'm anticipating is , for example, adding a new column of annotation to the variant table or adding a new field to the individual page (say kinship for example). When this happens I would like code changes to be minimal on both backend and frontend.
Probably worth having a high-level discussion or maybe putting a doc/slides together of how this could work?
My 2p is that adding protobuf to enforce a JSON contract is a lot of bureaucracy and extra work for no gain. You are better off prototyping the interface you want to create using a way to enforce a schema in JSON, such as pydantic (within Flask or in FastAPI) and use protobuf only if, and when, you actually want to speak protobuf with a protobuf client.
Does it make sense to decouple pydantic schema from fastapi migration? Just to make something iterative
Good question, what should be our next step?
Also going back to my previous point, I'm interested in what makes sense to use from the react point of view? @YuanTian1991 Something like https://jsonapi.org/ ?
ping
I spoke to @YuanTian1991, he has also looked at this but has no strong opinions for now on what would work best for him on the js side. Unless anyone else has any strong opinions I suggest @priesgo you lead on this if that is ok with you? @dvarrazzo are you in agreement?
Pretty sorry that I happened to be quite busy since last week (some deadlines on projects...).
I read through your discussion, and based on my shallow expression (I always use most plain code to do work, like directly define Flask API and fetch via axios...), I think fastAPI looks like a good try.
I am thinking the fastAPI-react could be used as an API management small platform, like to check numbers of visit, numbers of clients. .etc. Like, google analysis. The front-end now does not have an admin-only dashboard.
And about JSON standard, I never thought there are so many more ways to achieve it (in much more standard way), I was initially thinking to add some extra code when fetching database could solve it. But surely add some "schema" for API calls is a good idea, thus in the future, we can directly check the schema, instead of Chrome console to know the output. The fastAPI-way of schema looks nice to me enough, since it's in one repo (easy to find), directly related to fastAPI, and it's related doc.
My understanding is if we use protocol-buffers, most currently defined front-end API should be adjusted, to receive proto3
style code right? And seems the defined file may be put in another repo, it looks a bit more complex.
So I think fastAPI is a good way, it's not too complex, but compared with my no-schema flask-API coding style, it's more structured and better to be collaborated.
I will try deploy the current backend API next week, and definitely catch up more on the discussion. >_<
Thanks @YuanTian1991 :). I spoke to @dvarrazzo and I think a video call would be the most beneficial maybe?
You guys can go ahead and have it without me or if it's after the 29th of July I'll be able to join. Let me know what you decide?
I think the relevant parties are @priesgo @YuanTian1991 @dvarrazzo and @IsmailM if he wants to join. As I said I don't have to be there but I am happy to attend if it's after the 29th of July.
Perhaps it would be good to maybe prepare some doc before the meeting.
No problem for me if a video call help.
In relation to issue #75, @priesgo was suggesting we have "definiton" files for the JSON objects that get passed between backend and frontend. These could be stored in a separate repository that gets imported by frontend and backend?
One solution @priesgo mentioned is https://developers.google.com/protocol-buffers
From what I understood this would also play nicely with https://grpc.io/ to allow streaming of variants in the future?
Wonder if the rest of you guys have any suggestions on what to use for this @dvarrazzo @YuanTian1991 @IsmailM ?