Closed ramfox closed 3 years ago
After some research & experimenting, I don't think just changing GetResult
to be polymorphic is going to answer all our problems.
I'm proposing a few changes that should make Get & it's functionality easier to grok.
json | yaml | csv | zip | script | |
---|---|---|---|---|---|
-- | y | y | n | y | n |
commit | y | y | n | n | n |
body | y | y | y | n | n |
meta | y | y | n | n | n |
readme | y | y | n | n | y |
stats | y | y | n | n | n |
structure | y | y | n | n | n |
transform | y | y | n | n | y |
viz | y | y | n | n | y |
rendered | n | n | n | n | y |
one-off behavior that doesn't exist uniformly over api & cli:
lib.Get
returns data in results.Bytes
& it's written to the response w/ a filename & content type setlib.Get
, if p.Selector
contains script
, gets scriptfile from dataset & reads all into results.Bytes
, cmd returns correct file typelib.Get
, if p.Selector
contains script
, gets scriptfile from dataset & reads all into results.Bytes
, cmd returns correct file typelib.Get
, if p.Selector
contains script
, gets scriptfile from dataset & reads all into results.Bytes
, cmd returns correct file typelib.Get
, if p.Selector
contains script
, gets scriptfile from dataset & reads all into results.Bytes
, cmd returns correct file typeThe... strangeness surrounding Get
& our new dispatchify/rpc pattern, is that whatever get returns from the lib method, is what gets returned over http, rpc, & is what is used when calling the lib method from the cmd package. This is already difficult to wrap your head around as is, when we are dealing with Head components vs the Body component, which are structured differently & have different formatting expectations.
These are the three main points that add even more (and dare I say, an unnecessary amount of) complexity & also don't feel like they belong over an json/post http api, nor rpc:
1) Outfile - why are we permitting folks to potentially create a file on another machine via http? Outfile option should be isolated to the cmd layer. If someone wants a zip file, or a script file, or a body file, or if a user asks us for a way to get a component as a file in the future, via the api, they should be using the sugar endpoints.
2) Formatting - if we pull the formatting worries to the cmd layer (the json api doesn't care about formatting, we just return the interface & rely on NewHTTPRequestHandler
to do the marshaling work for us), all we need to keep track of is the BODY format, since we have a special csv
case for the body. If the body returns as a slice of bytes, we can just print those straight to the terminal. Otherwise it's some structured data that we can marshal into whatever format we want.
3) returning files using our json api - I think it's pretty weird to have all the script
& zip
options allowed over the json api. I'm also not sure we want this to exist over rpc, although this will work fine since we are encoding/decoding on both sides of the wire. If we do want it to exist over rpc, then well, I'll just suck it up at this weirdness: if something pings curl -X POST -d '{"ref":"username/dataset","format":"zip}' localhost:2503/ds/get
they are going to get... {"data":"SOME_BASE_64_ENCODED_STRING_OF_THE_ZIP_OF_THE_WHOLE_DATASET==","meta":{...meta}}
. This seems super bizarre to me. (Secretly this is what happens if you POST to get a dataset body w/ format csv, but since we definitely want to have body over rpc, I'm willing to be okay with the strangeness, especially since over http the default will be json, so a user would have to go out of their way to see this.) I'm proposing we split the zip
and scriptFile
functionality into their own dataset methods that have denyRPC
. These will have corresponding sugar handlers that will set the correct content headers & file names, using the same patterns they currently have over the api. The way to access this functionality via the cli will also not change.
type GetParams struct {
Ref string
Selector string
Limit int
Offset int
All bool
}
type GetResult struct {
Value interface{}
Type GetResultType
}
type GetResultType int
const {
Head = GetResultType 0
BodyStruct
BodyBytes
}
Format
, all marshaling to a specific format happens at cmdBodyFormat
, as it is the only one that effects what we returnOutfile
, all saving to a specific file happens at cmd. It feels wrong to allow someone to potentially save a file via an rpc or http connectionGetResultType
, since our only two options are that lib.Get
returns a slice of bytes, or some structured data. The slice of bytes can just be printed to the terminal. The structured data can be marshalled into whatever we need.type GetScriptFile struct {
Ref string
Selector string
}
lib.GetScriptFile
viz
, readme
, or render
& using the --script flag. localhost:2503/ds/get/username/dataset/readme.md
return, in this example, a readmetype GetArchiveFile struct {
Ref string
}
lib.GetArchiveFile
zip
flaglocalhost:2503/ds/get/username/dataset/dataset.zip
new base func base.ReadBodyAsInterface
- shitty name, but basically, does the same paginated reading as ReadBody
, but returns the body as go native structure. This allows us to be flexible w/ our return types. If the format is json, then we need to make sure we can operate over http, sending over the body as json, and not as an encoded slice of bytes.
rpc layer can actually work well w/ encoded slice of bytes, since it will encode and decode for us, matching expectations on both ends. Technically this means a user can ask for a csv body over the json api, and we will return it, it just returns as a base64 encoded string (much like our scriptBytes
return)
bugs:
peernames
rather than usernames
NewHTTPRequestHandler
in my last pr that had us pass a pointer to the params interface to DecodeParams
, rather than the params themselves (which are themselves pointers). This was causing no params to be recognized as interface NZDefaultSetter
Take a look at ramfox/refactor_get
to see selecting a component/field w/ formats json
pretty
, yaml
for head, and csv
for body, work over json/post http, cli, & rpc! Branch is very messy, and doesn't look at the sugar endpoints at all, nor does it implement any strong typing, but it does prove that we can return different formats of body & all different iterations of the other components (including selecting into fields) over rpc, http, & cli.
This is a lot, but a GREAT breakdown!
Hard to get at all the points at once but can give some input on a few pieces:
/get
handler doesn't solve the issue for RPC as you need to make the top level call aware of the expected output and extend dispatch to handle raw vs json which is a bit against the grain. It also means that we can't really have generic selectors as the routing needs to be specified on the mux level and be explicit about those to be able to route to the proper handlers if they are not all in 1 bucket./get
and its sugar routes live under the same umbrella otherwise it could work I thinkThank you for reading!!! To your first and second points, my proposal (separating zip & file outputs) would only be in the case where we are okay with not having that functionality over rpc. So yea, it wouldn't make sense to separate these into different lib functions unless that's a functionality we are okay with.
Why do I favor removing access to something like getting a zip or an entire file from rpc? Mostly because the process to get it from the daemon to the terminal process offends me, haha. For this to work over rpc, we will get a slice of bytes from the file at the lib.Get
level. Our POST/JSON api will encode the whole file as json as a base64 encoded string (the default behavior of json.Marshal
). And then sends the zip file as an encoded string over http w/ content type json. The terminal process that launched the request then unmarshals the params, and the base64 encoded string would return back to a slice of bytes. We can then output those bytes into an outfile or printed to the terminal.
This part, although strange because of the encoding and decoding doesn't bother me. The thing that bothers me is that we will be supporting a non-sensical (to anyone who doesn't understand the need for our dispatch process) endpoint that returns an entire file as a base64 encoded string of a zip or html file.
TBH the more I protest against it the less stressed I am about it. There is just something fishy in my gut when I examine the process, and it felt like it made more sense to just not allow such a weird endpoint (application/JSON response of an encoded zip or html file) rather than have it, but only as sugar, where a user can GET
it and it would return in a format that makes sense. Which would mean disabling the behavior over rpc.
Way less offended now about it than yesterday, when the idea of having to document the endpoint was super bizarre to me.
This is great. I'm not going to catch it all, but some initial reactions:
yes
cells. Instead what this looks like is a few overlapping-but-distinct patterns of use (more on that below)qri render
and POST /ds/render
. get
should be strictly about fetching persisted data.I very much agree with your gut feeling that non-JSON or JSON-wrapped-bytes responses are a bad idea, and I agree that adding polymorphism support doesn't solve that core problem. The more I look at it, the more it seems adding polymorphism would only tangle things up more.
So to that end, I buy a lot of your proposed solutions, but within the framing of thickening "client" code to provide sugar around a core lib.Get
method. In practice this means adding code to api
and cmd
that create end user features from one or more calls to lib.Get
. You've suggested something very close to this, but instead of adding methods, define helper functions in lib
that wrap an Instance and call the Get
method to construct a response, then use those helpers in api
and cmd
to construct end-user responses.
YAML
crap should move to cmd
. YAML is really just a view layer for the command line at this point.lib
helper function:package lib
// GetZip fetches an entire dataset as a zip archive
func GetZip(ctx context.Context, inst *lnstance, refstr string) ([]byte, error) {
res, cursor, err := inst.Dataset().Get(ctx, &GetParams{ Ref: refstr, All: true })
// compress result, return bytes
}
csv
format:package lib
// GetCSV fetches dataset body data in csv format, passing a limit of -1 gets all rows
func GetCSV(ctx context.Context, inst *Instance, refstr string, offset, limit int) ([]byte, error) {
}
We really need cursors to work for this to be clean & easy, but the helper functions will isolate this messy code & keep it in lib
We can remove the idea of having to support rendered
here. But we still need to support getting back the script file of a viz, readme, or transform. None of which are "rendered", but are persisted data, so we still need some kind of GetScriptFile
helper function, and this should still come from the /get
endpoint and qri get
cmd.
Re: helper methods. Sounds good!
Also sounds like moving outfile
to just cmd
isn't controversial, nor is moving formatting responsibilities to cmd
. Basically, lib.Get
deals with interface{}
responses, and anything else is relegated to a helper function, that is called by cmd
or a sugar handler.
But we still need to support getting back the script file of a viz, readme, or transform. None of which are "rendered", but are persisted data, so we still need some kind of GetScriptFile helper function, and this should still come from the /get endpoint and qri get cmd.
I'd rather see those handled with a script
selector than a helper function and kept within the main lib.Instance.Dataset().Get
method (basically, leave this as it is). scripts will generally be small. To me it makes sense to do these as base64 bytes. I don't think, need the helper function, but you might be catching something I'm missing.
Also sounds like moving outfile to just cmd isn't controversial
yes totally. I very much agree with your sentiment that outfile
belongs in cmd
.
No your right it’s totally fine without a helper func. And fully agree about those being a-okay as base64 encoded strings
After working on this here is where we stand:
Calling lib.Get
will in almost all cases return structured data.
There are three special cases that return a slice of bytes: 1) asking for a zip of the dataset 2) asking for the script file 3) asking for a csv body
Because of the way rpc/http marshals and unmarshals the content over the wire, we still need to have two separate fields for any interface{}
(structured) return type and any []byte
return type. (JSON marshal converts byte slices to a string, in dispatch, when we get the GetResult
back from rpc, we marshal that json into a GetResult
. If the field we marshal into is just interface{}
, it assumes the response is a string (because we don't have anything specifying it should be a byte slice), and it causes side effects because the response is not what is expected.
Since we don't want to separate out these cases into their own methods (yyyy??), we then need to get more organized.
After spending time with the inputs and organization (and now that it is possible to return body as structured data rather than a slice of bytes), and that we've removed some of the formatting & file saving out of get, it's more clear how to streamline the GetParams
. Currently, there are two strange interactions.
1) The Format
field is now only used when byte data is going to be returned. Format == "zip"
and Format == "csv"
are the only things that have meaning.
2) We use readme.script
or viz.script
to indicate in the cli that the user wants to get the script bytes of that component. There are multiple time (3 i believe) where we end up parsing this string and checking if we are looking for a script. This to me means that there is something organizational that is missing.
I'm proposing these be the GetParams
type GetParams struct {
Ref string
Selector string
SpecialCase // ZipType, CSVType, ScriptType, empty indicates normal.
Limit int
Offset int
All bool
}
It needs a better name. We can also skip the strong typing & use strings, or just have three flags Zip
, CSV
, Script
, that error if more than one is set.
These flag overrides would very clearly let the dev know something different is happening and organize what is expected in and out of the function. All of them return byte slices, all of them have the potential to override other fields. (Zip
& CSV
override Selector
). They have special behavior that isn't "typical" of Get
. There would be a clear pattern for if we need to add any other special cases.
This brings us back full circle to our first discussion when refactoring. Rather than labeling & typing the return value, we instead type the input & rely on the input params to know what the expected return is.
I'm still for separating out GetZip
, GetScript
, GetCSV
as helper functions that call inst.Dataset().Get()
and having cmd & the sugar handlers call those directly.
Before we "get" started (lawl puns! in 2021!), I'd like to reiterate that the reason we don't want to split up get
comes from early users adopting export
faster than get
, and missing out on the power of get
in the process. Get needs to be really powerful, It also needs to be singular. When there was more than one way to fetch data from qri, we had users going to different commands asking for things provided by the other command: "Export can't give me JSON!?!?!?" and "can I just 'get' a CSV file!?!?".
Other projects might blame users for not reading the manual. I think we should blame bad developer experience. Right now we're doing the hard work of building a unified data fetching interface that supports many different use cases. Splitting it up detracts from where we're trying to go, even if it's "just" at the lib level.
Moving on to the concrete parts:
SpecialCase
as Format
, and instead of thinking about it as "only sometimes in use", write it as always being used, but with a sensible JSON default.zip
and csv
format responses. Instead we should acknowledge that base64-encoded strings are valid JSON, and force clients to deal with it, or just convert scripts to raw strings. The important part is to think of script-selector responses as JSON strings.Get
interface{}
to force users to write client side code or use lib helper function(s).
type GetParams struct {
Ref string
Selector string
Format string // output format. one of "json" | "zip" | "csv", default is "json"
Limit int
Offset int
All bool
}
People got confused because we surfaced multiple ways to Get
things to the user. We can have separate methods for the underlying functionality & have one way for users access it. All of the API can stay the same localhost@2503/get/username/dataset/body.csv
can still exist, but be handled by an api that calls inst.Dataset().GetCSV
, qri get me/dataset --format zip
can be routed to inst.Dataset().GetZip
in cmd/get.go
. We do this for other commands. But again, this is not the hill I'm dying on, lol, I've made my peace.
1) The thing is, we arent returning json. We are returning go-native structures in all the straight forward cases, and returning a slice of bytes in all the special ones. We marshal to json (or yaml, or prettified json) at the cmd, api & rpc levels, but not at lib.
2) Script responses are valid json. This wouldn't change that, it would just help us have clearer code. Also, readme.script
has no meaning at the lib level, .Script
is not a valid field on. It's a weird special case, we should handle it as such.
3) because of RPC & json marshaling we can't do this. We need both interface and byte slice options. For RPC json unmarshal will not properly decode any base64 encoded string if we have it unmarshaling into an interface{}, because it will just assume the base64 encoded string is a string & unmarshal to that. We need the response to indicate a slice of bytes. We can also have multiple checks & decode the string ourselves, but that seems messy to me. This is part of why I think the functions should be separate, but again lol not dying on this hill. We jsut still need the return type to be
GetResult {
Value interface{}
Bytes []byte
There are many different cases that our
lib.Get
method (and our/ds/get
endpoint) are covering. In order forlib.Get
to work over rpc, we need to adjust theGetResponse
so that we can send over different types of return values using the http/json api.