Closed nightlyone closed 7 years ago
I have pretty simple, clean solution for creating Schemas from nested map[string]interface{}
objects that created are from Json.Unmarshal
. It includes "merging" and coalescing nested repeated structs, which I didn't see accounted for in the patch. Switches are used without reflection bc unmarshal has limited number of types.
Let me know if interested in something and I can PR.
We're definitely interested. Unfortunately we don't accept github PRs; we use Gerrit. See CONTRIBUTING.md.
Let me know if that's too much trouble and we'll find another way.
no problem, noticed the contributing.md
I've finally started to devote some serious attention to this.
We now have a fields package that correctly enumerates the exported fields of a struct according to the Go rules, including embedding. So correct treatment of embedding now falls out as a consequence of using that package. See https://code-review.googlesource.com/9613.
The fields package can be configured to parse tags, so tags can easily be added. But I'd like to see a design doc/issue. The tags need to work across inference, loading and saving. Just supporting an alternative name with a tag, like bigquery:"othername"
, would be trivial, but I would want to make sure that had value. It clearly does in JSON because many JSON objects use lower-case keys, but I don't know if any bigquery users care. Especially since loading a row into a struct will use case-insensitive comparison of column name to field name.
@schmohlio @nightlyone I understand now that for both of you, the goal is to infer a schema from JSON, not a Go struct. Supporting Go structs via reflection is in the tradition of other Go packages like encoding/json and datastore, so I feel that it belongs in this package. However, I don't think that inferring a schema from a JSON object fits in that category, useful though it may be. And the function can easily be provided outside of this package. So I'd like to keep it that way. @schmohlio, I recommend you make your package production-ready and available via go get, and we will be happy to point interested folk to it.
@mcgreevy @zombiezen your thoughts?
@jba makes sense to me!
@jba My goal has been to infer a BigQuery schema from the same struct, that has been used to write out a JSON stream. This infered schema should then be used to read in this data from GCS via a BigQuery Load job.
@nightlyone I see. That actually seems like a nice idea. I'm glad that the forthcoming embedding addition aligns BigQuery and encoding/json more.
If you only use "json" tags for case, then you shouldn't need "bigquery" tags, because column names are case-insensitive. If we did add tags, I doubt very much we'd honor "json" tags, so at a minimum you'd have redundant tags. Maybe you could tell me exactly how tags would help you (or point me to the comment where you've already said that, if I missed it).
Adding "bigquery" tags to those structs is no issue and honoring "json" tags would not be needed. What would be useful is the ignorance tag "-" like in JSON. That all would allow us to use only one structure instead of 2.
As requested by @mcgreevy I excluded support of nested structs from #213 and leave it open for discussion and possibly later implementation.