Closed mathemancer closed 2 months ago
@Anish9901 if you're far enough along on figuring out testing for these, go ahead and add a couple tests. Otherwise, I think we should merge this untested (I did check manually to make sure things were working).
@seancolsen @pavish Take a look and see what you think of:
@kgodey It might be worth a glance from you, since this is the initial PR that will define how the RPC endpoint looks and works.
This is ready to review, but I'm leaving it as a draft until @Anish9901 replies with an opinion on whether to add tests into this PR.
I'll add some tests.
Also, I made a couple edits to the wiki to bring the RPC transition plan into conformance with this PR (e.g. django-modern-rpc and dots in method names).
Thank you for this thoughtful feedback, @seancolsen !
You're correct that dependencies between function parameters is a smell. As I read your thoughts, I realized that the connection_type
parameter adds precisely zero information, and can only ever produce an error. We should remove it, and just accept an optional connection_id
.
As for your broader point: I typically prefer primitive types for "public" function arguments, and prefer them even more strongly in this case where there's a translation between two languages. I'm using "primitive" a bit loosely here to mean any non-composite type. My reasoning is that
datetime
types are different for different languages.int
parameter will accept as valid input, but there are many potential questions about what a Connection
parameter will accept as valid input.connection_id
when submitting a connection_type
should produce a KeyError: "connection_id"
error without too much (maybe any) custom handling, and could even be upgraded to
TypeError: connection_id keyword arg is required when connection_type="user_database".
if we wanted custom handling. If it's nested within an object, we have to then provide some nesting in our error response:
TypeError: the connection keyword arg requires a connection_id attribute whenever its connection_type is "user_database".
The more nested the object in question is, the worse that problem gets.
With that said, I agree that we'll likely have to handle some kind of nesting if polymorphism is required. In a different language, I'd probably use multiple dispatch in the API-facing functions to sort that whenever possible, but oh well. Another option (even in python) is to simply have differently named methods for different combinations of arguments. Of the concerns above, the self-documenting part is what I'm most loathe to give up.
The most self-documenting solution I can think of is to add data classes representing any nested object to the same module as the function(s) that accept(s) it, and use type hinting to specify that a given argument should be of the type of that data class. The class woudld be picked up by the mkdocs
plugin, and even cross-referenced with the type hint. However, that type hint would be a lie! I.e., the actual type received via API call would always be a dict
! So, we'd have to then add some branching logic to handle the possible inputs, and document how to interpret the documentation of those data classes. We could try to create a custom __init__
, but I expect that would confuse the auto-generated documentation.
A less self-documenting, but in some ways cleaner approach would be to specify that the argument is a dict
via type hinting (and so in the docs), and then specify details about the form of the dict
in the docstring of the API-facing function.
I suppose I'm just trying to avoid the nesting problem whenever possible. Whenever absolutely necessary, I prefer the latter solution.
Regarding the return type: I tend to prefer returning primitives when possible for most of the same reasons (translation and self-documenting). With that said, I don't have a fundamental problem with returning a flatish object from functions in this context. However, that still leaves the issue of making the function self-documenting. I think the same options mentioned above apply, but I want to tinker with that a bit. I do see your point about non-breaking API changes...it's a good one. With that said, I do think that if a public function is modified to return a different type, it should just be a new function instead.
Also, I made a couple edits to the wiki to bring the RPC transition plan into conformance with this PR (e.g. django-modern-rpc and dots in method names).
Thank you for that, and for improving the documentation structure in this PR!
@seancolsen As an example (don't merge yet), I threw together a class to handle returning an object and documenting the returned attributes automatically. Not sure I like this, though.
I'm just trying to avoid the nesting problem whenever possible
Yeah, this was my hunch. Understanding your perspective is helpful here. I think that as the dust settles around the RPC API this is the sort of thing we ought to document in some kind of standard.
We should remove the
connection_type
parameter, and just accept an optionalconnection_id
.
That sounds like the best path forward for this PR.
we'll likely have to handle some kind of nesting if polymorphism is required
Right. For now though, I say let's just kick this can down the road and deal with it when/if we have a more pressing use case.
I thought this PR would be a good opportunity to take a stab at it and that maybe it would be straightforward to add some nesting.
When/if we do have a more compelling use case for nesting, I have a lot more thoughts in response to your points. But it's probably not worth getting into the weeds now though if we don't really need to.
Another option is to have differently named methods for different combinations of arguments
This occurred to me as well, and I'm open to it. I can see how this approach could very well end up being our solution in some other cases.
if a public function is modified to return a different type, it should just be a new function instead.
This approach worries me.
To be clear: We're talking about a hypothetical future where we want to change a return value that's already implemented. We don't have that situation yet, but I still think we should be on the same page about our mindset towards extensibility because it affects the design choices we're making now.
If we someday end up with two API functions that take exactly the same arguments and return ever so slightly different return values, that will be a recipe for confusion on the client side. Sure, we could mark one as deprecated. But even still, in many cases it would difficult to create distinct names for the two functions without introducing mess into the API design. I can see how substantial differences in behavior or function signature could warrant the addition of a new function. But I think my bar for introducing a new function would be quite higher than yours.
I'm trying to be very intentional about approaching this RPC API design from a holistic perspective. That's why I've been working to organize and standardize all the function names ahead of time as much as we reasonably can. This way they all make sense together. To me — and I'm speaking from the perspective of an API consumer — there is a very high value to having a cohesive set of API functions which are easy to understand and reason about when taken in conjunction with one another. I don't want to have to think too hard about which function to use in a given circumstance. And for any new devs or third party API consumers (who won't understand the evolution of our API), this goal becomes an even higher priority.
You said:
I tend to prefer returning primitives when possible for most of the same reasons (translation and self-documenting).
I'd like to better understand your opinion on this point.
We still need to figure this out for this PR because it will continue to block my front end implementation.
Let's take create_from_known_connection
as an example. I see three options for our return type:
(A)
42
(B)
{
"id": 42
}
(C)
{
"id": 42,
"nickname": "apple",
"database": "apple",
"supported_types_url": "…",
"username": "mathesar",
"host": "mathesar_dev_db",
"port": 5432
}
You seem to want (A). My problems with (A) are:
So I want (C). Can you tell me what, specifically, is problematic about (C)? Can you give an example?
I threw together a class to handle returning an object and documenting the returned attributes automatically
I looked at the docs that this generates. Cool! This works for me!
Not sure I like this, though.
I'd like to hear why.
I threw together a class to handle returning an object and documenting the returned attributes automatically
I looked at the docs that this generates. Cool! This works for me!
Not sure I like this, though.
I'd like to hear why.
The thing I don't like is that to get the documentation to work, I had to add a fallacious type hint for the return value. I.e., it doesn't actually return a DBModelReturn
, it returns a dict
with keys matching the attributes of that class. This means the docs will be inaccurate for any user who is calling the function from Python, rather than via the API. It will also cause linters and so on to throw warnings or errors everywhere if we ever use the return value of the function in a call from Python.
@mathemancer
What about something like this?
from typing import TypedDict
# ...
def filter_dict(d, cls):
return cls(**{key: d[key] for key in cls.__annotations__ if key in d})
class DBModelReturn(TypedDict):
"""
Information about a database model.
Attributes:
id (int): The Django id of the Database object created.
name (str): Used to identify the created connection.
db_name (str): The name of the database on the server.
username (str): The username of the role for the connection.
host (str): The hostname or IP address of the Postgres server.
port (int): The port of the Postgres server.
"""
id: int
name: str
db_name: str
username: str
host: str
port: int
Then as a return value:
filter_dict(db_model.__dict__, DBModelReturn)
@mathemancer
What about something like this?
...
Neat! I wasn't aware of TypedDict
. I pushed a change that used a similar concept, and I think we'll probably both be happy with the result. I also made the "create" -> "add" change. I'm fine with that verb, as long as we're consistent.
It occurs to me that returning the id
field won't make much sense once we're constructing connections out of parts of different models under the hood. @seancolsen Are you actually using that? What would you actually need returned for:
I think the current return object is okay as a proof-of-concept, but I think we should avoid just returning what the REST API returns, since these functions won't actually be managing a single resource in the near future.
- Since we're returning the
port
as anint
, it might be nice to accept it that way too, i.e. by changing theport
request param withinadd_from_scratch
fromstr
toint
.
Good catch.
- In
DBModelReturn
the property names within the docstring do not match the property names defined on the class. ~I don't have a preference between these different names~ (edit 👇), but they should be consistent. I would be nice to make sure they're all consistent with the request too.
Ah. Let this be a lesson about being clever. We can't (and probably shouldn't) automatically pick up the API-expected names for fields from the underlying model attributes.
(edit) Actually I do have a slight preference — I'd like to use
nickname
(instead ofname
) anddatabase
(instead ofdb_name
) because that's what the REST API uses and it would be nice to minimize changes over the REST API unless we have a good reason to do so.
I'm fine with that, but see my other comment. We should double-check that the returned object's attributes all make sense with the new models and API.
@mathemancer
I love this! The auto-generated docs look great!
I have two comments:
I'd like to improve how
code
within error responses work. For a request toconnections.add_from_known_connection
, we get the same code for all errors thrown by the db & for connection errors to the db. ...These are two different db errors and I'd like a way for the frontend to be able to differentiate between them. Currently, both return the same value for
code
. I'm fine with having a separate property to differentiate them andcode
being the same.
Agreed. I haven't gotten around to adding custom error codes yet, but plan to do so. I wanted to just get it so we're at least returning JSON even when an error happens, then get more precise about the codes. I'll add codes for the two errors you mentioned as an initial example. I figure we should consider custom codes for various errors to be feature requests (beyond the basics, anyway). Does that make sense to you?
- I'd like the error message structure to be documented in the docs (not individually for all requests, a common block on the page would be cool).
Great idea; will do.
@mathemancer yes, the front end needs the connection id
field when the user edits or deletes a connection. Your comment sort of opens up a pandora's box of thoughts... It's is a slippery slope into a territory where we're no longer talking strictly about the API-layer transition, but instead talking about the larger architectural transition, including the new models. I can appreciate that API and architecture are inextricably intertwined, but so far I've been approaching this PR as if it's somehow isolated — operating only within the API layer as much as possible. If we're aiming to make the new connections RPC APIs in this PR work well with the new models, then I'll potentially have much more critique. For example, when all the dust settles, I don't expect we'll have a concept of a "connection" anywhere in the stack. In my opinion the very premise of this API is flawed if we're to view it in the context of the new models.
This is making me think we might need to go into greater detail about the smaller steps we intend to take over the course of this whole transition.
Alternatively, we could consider kicking off the RPC transition by picking a sort of "leaf" API instead of a "trunk" API. For example /api/ui/v0/users/{userId}/password_reset/
seems to be far less intertwined with the architectural transition. We could take that endpoint (and ones like it) in isolation a bit more easily.
I'm considering my review complete.
@mathemancer yes, the front end needs the connection
id
field when the user edits or deletes a connection. Your comment sort of opens up a pandora's box of thoughts... It's is a slippery slope into a territory where we're no longer talking strictly about the API-layer transition, but instead talking about the larger architectural transition, including the new models. I can appreciate that API and architecture are inextricably intertwined, but so far I've been approaching this PR as if it's somehow isolated — operating only within the API layer as much as possible. If we're aiming to make the new connections RPC APIs in this PR work well with the new models, then I'll potentially have much more critique. For example, when all the dust settles, I don't expect we'll have a concept of a "connection" anywhere in the stack. In my opinion the very premise of this API is flawed if we're to view it in the context of the new models.This is making me think we might need to go into greater detail about the smaller steps we intend to take over the course of this whole transition.
Alternatively, we could consider kicking off the RPC transition by picking a sort of "leaf" API instead of a "trunk" API. For example
/api/ui/v0/users/{userId}/password_reset/
seems to be far less intertwined with the architectural transition. We could take that endpoint (and ones like it) in isolation a bit more easily.
We've already gotten most of the juice I was hoping for out of this PR. I want to:
I suppose my question about the id
was inspired by the hope that there might be low hanging fruit (i.e., just don't return the id
) that we could knock out now. I think for now, we should just return the object as-is (with the attribute name changes you requested).
A broader question we should talk about is our philosophy w.r.t. return objects composed of pieces of various models. One advantage of the RPC style is that we can return such composite objects easily. I've pondered my thoughts about this more, and I think that I'm friendlier to nesting and complex objects in the output from than the input to the RPC functions.
I think this is ready for re-review, all current feedback is handled ( i hope )
@mathemancer
- In both methods, I see
db_name
in the request butdatabase
in response. I'd like these to be consistent. Can we change the request to usedatabase
?
Sure, that's fine with me. I just left it that way since I figured it would make the front end changes more minimal. As I type this, though, it seems like it's unlikely to produce a bunch of work for y'all to make it more consistent.
- I'm not sure what the purpose of the
get_error_code
method is. I tried playing with it to see if I could understand it better, but I wasn't able to get it to work. I tried sending: ...
Argh. I see now that my attempts to document the error handling have produced more confusion than clarity. The get_error_code
function isn't intended to be called from the RPC endpoint. I added it to the documentation simply to provide the error code schema (I.e., which numbers go with which libraries, etc.). I'll try something different.
Fixes #3519 Fixes #3518
Summary
This adds a new RPC endpoint, available at
/rpc/
. At the moment, the endpoint is available only when logged in, and the functions are available only for superusers. The functions are:connections.create_from_known_connection
, andconnections.create_from_scratch
.Technical details
The functions are documented via auto-generated docs (created from their docstrings) in the new documentation page at
/user-guide/rpc-functions/
.I elected to go against my previous plan and use
django-modern-rpc
.Unfortunately, this meant the auto-generated docs were not good enough. So, I decided to use the mkdocs plugin to include RPC function docs in the docs site instead.
This PR disables the check to prevent type hints!!!
Additional context
Checklist
- [X] My pull request has a descriptive title (not a vague title like `Update index.md`). - [X] My pull request targets the `develop` branch of the repository - [X] My commit messages follow [best practices][best_practices]. - [X] My code follows the established code style of the repository. - [ ] I added tests for the changes I made (if applicable). - [X] I added or updated documentation (if applicable). - [X] I tried running the project locally and verified that there are no visible errors. [best_practices]:https://gist.github.com/robertpainsi/b632364184e70900af4ab688decf6f53 ## Developer Certificate of OriginDeveloper Certificate of Origin
``` Developer Certificate of Origin Version 1.1 Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 1 Letterman Drive Suite D4700 San Francisco, CA, 94129 Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. ```