googleapis / google-cloud-go

Google Cloud Client Libraries for Go.
https://cloud.google.com/go/docs/reference
Apache License 2.0
3.72k stars 1.27k forks source link

firestore: Use structure as MapKey #1740

Closed guillaumeblaquiere closed 2 years ago

guillaumeblaquiere commented 4 years ago

Is your feature request related to a problem? Please describe. I would like to use a structure and not a string as key of a map. My first use case is to use a date object and to be able to convert it in string when I store it in Firestore.

Describe the solution you'd like A similar implementation as JSONMarshal/JSONUnmarshal can be done. If the key of the map is a String or implement this interface, use the string or the string generated by the interface for storing the data in Firestore. Similar logic for the read

Describe alternatives you've considered I have a transient object for storing data in Firestore, and a "business" object for doing my logic. More work for... no value.

Additional context I saw a related feature request #1475 and I though about this as an evolution/generalization of this initial feature. I already pushed a working code about this in Gerrit: https://code-review.googlesource.com/c/gocloud/+/50430 Open to work on a generalization of both if it's a correct approach

tritone commented 4 years ago

Thanks very much for writing this up. @BenWhitehead, I would be curious to get your take on this change as an expert on the firestore client.

BenWhitehead commented 4 years ago

Conceptually I think this sounds like a useful proposal, I've had to do similar things in the past in other languages and serialization formats.

There are however a few concerns that I believe should be thought about before moving forward with this feature request.

  1. In Firestore, document field names have the following hard limits [1]:

    Constraints on field names | Must be valid UTF-8 characters Maximum size of a field name | 1,500 bytes

  2. Use of dynamic field names can cause issues around data persistence and performance.
    • Firestore implicitly creates Single Field Indexes for all document fields[2] to allow easy query ability, however there is a limit on how many single field indexes are allowed to be configured in a Firestore database. For example if you want to create a single field index exclusion configuration, only 200 [1] are allowed to be configured.
    • When documents are written into Firestore the indexes need to be updated as part of the write operation, if there are many indexes this can have an adverse affect on write throughput.
  3. Possible name change. Since the encoding/decoding is specific for document field names I'd expect the name to be more reflective of that fact rather than map key. (This point I'm less concerned about than the other two, as I am not personally familiar with the naming conventions in golang, more just thinking about it from the Firestore angle.)

[1] Usage and limits [2] Automatic indexing

guillaumeblaquiere commented 4 years ago

Thanks @BenWhitehead for this expertise and your concerns (and topics to take care) when we propose new feature on Firestore.

However, I think that this feature request won't increase (or decrease) your current concerns.

Indeed, the current map key is a string type. And the feature request proposes an hook which transforms the struct in string. (and vice-versa)

No new type or strange thing. It's just an on-the-fly transform instead of having a struct compliant with Firestore that I map into a struct compliant with the business logic.

Let me know if you want to discuss more about it.

BenWhitehead commented 4 years ago

Hi @guillaumeblaquiere,

You are correct all of the current limits already exist and have to be adhered to by users, my main concern here with adding the new encoder/decoder is that it will introduce a new are for potential gotchas. Since the new encoder/decoder can theoretically allow any struct to function as a document field name transparently it may be easier to accidentally generate a field name that won't validate with Firestore. (There is also the concern that a sufficiently complex struct could lose the ability to be decoded over time due to struct changes).

It may be safer to broaden the support for what can be passed as a document field to a few verified and tested types (types we know have stable encoding/decoding representations, fit within the size constraints, etc.).

Given the additional context here, I'd be interested to hear @tritone and @tbpg's opinion on this change since they are more familiar with the golang conventions than I am. I want to make sure that we don't create a potential landmine while expanding functionality.

tbpg commented 4 years ago

This seems related to #1438 -- needing an easier API to work with custom structs in Firestore.

Can you clarify the difference between this proposal and #1475? My understanding is that the firestore package would handle the marshal/unmarshal step automatically, no user code needed?

My initial reaction is that I'd prefer if users were responsible for the marshal/unmarshal steps for the reasons @BenWhitehead mentioned. If users are responsible for the marshal/unmarshal step, they can ensure their changes are backwards compatible for the string representation and that the string representation doesn't exceed their limits. If they want some sort of binary representation, they can. If they want something else, that's OK too.

One benefit of this is that we could add the "default" functionality later if desired -- it's a "temporary no."

To be clear, the benefit of this is that users do not have to marshal/unmarshal every time they use the DB. An interface for that seems worthwhile to me.

guillaumeblaquiere commented 4 years ago

My apologizes, I did the things in the wrong order. I implemented the feature that I wanted in Firestore Go library (because it was fun to do this!), I submitted it on Gerrit, and people there told me to open an issue here to discuss about it. Then, by reading the open issues, I found the #1475 in relation with mine, but wider. Indeed, mine only allows the marshalling/unmarshalling of the keys in a map type. The #1475 proposes to do this on all fields.

Anyway, I agree with you: The user must be responsible of its marshalling/unmarshalling by implementing an interface. Thereby, if you look at my code, you can see that I exactly do this. However, the naming is specific for MapKey element.

The #1475 is wider than the mapKey special case. I can extend my code for integrating this feature, if you agree.

Let me know

tbpg commented 4 years ago

All good! Thank you for working on this. :)

Is there a way we can do both at the same time?

I think it should be a single interface, not two. If someone does custom marshaling, they should do custom unmarshaling, right?

guillaumeblaquiere commented 4 years ago

@tbpg, I tried with only one interface for marshalling and unmarshalling and it didn't work. @codyoss did the same remarks on gerrit.

Before implementing myself the marshalling/unmarshalling of structs in Go with reflection, I didn't understand why Json package has 2 different interfaces for mashalling and unmarshalling

After some tests, I understood that is because its not possible to do something like this JSON example with the same interface

func (a *Animal) UnmarshalJSON(b []byte) error {
    var s string
    if err := json.Unmarshal(b, &s); err != nil {
        return err
    }
    switch strings.ToLower(s) {
    default:
        *a = Unknown
    case "gopher":
        *a = Gopher
    case "zebra":
        *a = Zebra
    }

    return nil
}

func (a Animal) MarshalJSON() ([]byte, error) {
    var s string
    switch a {
    default:
        s = "unknown"
    case Gopher:
        s = "gopher"
    case Zebra:
        s = "zebra"
    }

    return json.Marshal(s)
}

The use of pointer for unmarshalling (you want to update the struct provided in the function parameter) and the use of "no pointer" for marshalling (you don't want to update the struct provided in parameter, thus use a copy of it) are evident and a good practice.

However, the reflect library doesn't identify the struct as implementing the interface if the "function name prefix" (i.e. func (a *Animal) ....) is a pointer for a function, and not a pointer for the other one of the same interface. So, for solving this, the JSON package defines 2 interfaces. Same thing for Text and Binary marshaller/unmarshaller

Thereby, I have in mind to reproduce this already known way-of-use by the developer in the Firestore library.

However, and as mentioned before, I'm not a Go expert and if there is another way for implementing this, I will be happy to learn more on this (because I spent lot of time to understand (and solve) why my code was working with only one of the parts, but never with both in the same interface!)

What's your advice?

guillaumeblaquiere commented 4 years ago

@tbpg any update?

tbpg commented 4 years ago

Thanks for following up. Got it. That makes sense with not being able to combine the interfaces.

I'm +1 to move forward, but would appreciate a second opinion from @broady and/or @codyoss. Sounds like @BenWhitehead is also +1, given we document and link to the restrictions.

codyoss commented 4 years ago

As long as @BenWhitehead and team are okay with this I have no objection.

guillaumeblaquiere commented 4 years ago

Great. I'm going to work on it. I'm on holidays the next week but I do it ASAP.

guillaumeblaquiere commented 4 years ago

@tbpg I just committed on gerrit and, at least, kokoro build is passed. I let you have a look on it and send me your feedback. Thanks

https://code-review.googlesource.com/c/gocloud/+/50430

tbpg commented 4 years ago

Thank you!

marcin-ptaszynski commented 4 years ago

Hi guys

This is a great idea - I also have a similar case, but I need to (de)serialize struct with interface value, so I wanted to suggest slight extension of this feature. How about having (Un)Marshal work with exposed protobuf firestore.Value type instead of string?

It would allow for simple String Value translation with something like:

type MyStruct struct {
    private string
}

func (m *MyStruct) String() string {
    return m.private
}

func (m *MyStruct) ToFirestoreValue() (pbvalue *firestorepb.Value, err error) {
    return &firestorepb.Value{ValueType: &firestorepb.Value_StringValue{m.String()}}, nil
}

func (m *MyStruct) FromFirestoreValue(pbvalue *firestorepb.Value) (err error) {
    m.private = pbvalue.GetStringValue()
    return nil
}

But at the same it would enable support for more advanced uses and completely or partially override current mechanism based on field tags, thus adding support for:

  1. serializing structs with interface values (like protobuf structs with oneofs)
  2. serializing structs with private fields
  3. advanced translation for domain specific indexing, like: IPAddress subnet queries, geohashing
  4. automatically enrich document with magic fields, like poor-mans string indexing for "starts with" queries, e.g.:name._prefixes array-contains "prefix"
  5. detecting and translating multiple versions of documents, thus allowing for easier zero-downtime database migrations without the need for additional layer.

What do you think?

BenWhitehead commented 4 years ago

I think it will be a mistake to broaden things more than they already are. As the current design is looking, we're already into the territory of introducing some very possible issues for people that they may not realize they're getting themselves into. Firestore is not fully schemaless and infinitely scaling on the schema today. There are very real consequences to having a very large number of document properties given every property is automatically indexed. Regarding the limits and my concerns running into them see my comment above.

guillaumeblaquiere commented 4 years ago

@tbpg (and others), did you have time to review this issue ?

Here the gerrit link: https://code-review.googlesource.com/c/gocloud/+/50430

Thanks in advance for you feedback

adrianduke commented 4 years ago

Whats the status of this feature? Having looked at the gerrit link it says it's abandoned, is there an alternative solution in the works? I'm currently experiencing issues with the lack of this feature as follows:

Given some struct representing a document:

type MyDocument struct {
  NullableField sql.NullString // amongst other sql.Null* types (Float64, Bool, Time... etc)
}

will encode into firestore as (rightly so):

{
  "NullableField": {
    "Valid": false,
    "String": "",
  }
}

However part of this migration is to have a frontend Javascript client simultaneously read the same document as the backend Go code, and it seems wrong to burden the frontend with having to parse Go's Nullable types when it should just read:

{
  "NullableField": null
}

When NullableField.Valid == false and:

{
  "NullableField": "some string"
}

When NullableField.Valid == true.

I'm partially migrating the Go codebase from SQL to Firestore and the ability to Marshal/Unmarshal would be very helpful in achieving this, in particular I'd expect to solve this issue as follows:

type FirestoreNullString struct {
  sql.NullString
}

func (ns FirestoreNullString) MarshalFirestore() (*firestorepb.Value, error) {
  if ns.Valid {
    // ... some code to return ns.NullString.String as string type
  }
  // ... some code to return a null type
}

func (ns *FirestoreNullString) UnmarshalFirestore(v *firestorepb.Value) (error) {
  // ... some code to convert back
}
type MyDocument struct {
  NullableField FirestoreNullString
}

Note: Solving this issue by providing firestore's own NullString,NullFloat64, NullBool... etc wouldn't solve this issue (unless I can utilise their same mechanism for encoding/decoding), as I need to simultaneously support both SQL (with sql.Scanner and driver.Valuer) and Firestore correctly encoding.

Examples of other Marshaler/Unmarshaler or variations of:

guillaumeblaquiere commented 4 years ago

@adrianduke, indeed, this feature has been abandoned. The team has explained me that they wanted a more generic and more consistent feature trough all components.

I have no timeline about this feature now.

crwilcox commented 2 years ago

Closing this as a duplicate as the work needed is captured in 2610.