protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.14k stars 15.44k forks source link

[Ruby] `#to_h` and `#to_json` with Hash/Array/Struct etc returns `{"descriptor":[{}...{}]}` when `active_support/core_ext/object/json` loaded #17037

Open jufemaiz opened 3 months ago

jufemaiz commented 3 months ago

What version of protobuf and what language are you using?

Version: main/v3.27.0+ etc. Language: Ruby

What operating system (Linux, Windows, ...) and version?

MacOS 14.5

What runtime / compiler are you using (e.g., python version or gcc version) Ruby v3.2

What did you do?

message Thing {
  string whatsit = 1;
}

Build a library with it (or anything else). Then include it and build something containing a protobuf message. For example:

require 'active_support'
require 'active_support/core_ext/object/json'

...

thing_proto = Thing.new(whatsit: "Something random")

ThingInAThing = Struct.new(thing:)

thing_struct = ThingInAThing.new(thing: thing_proto)
thing_hash = { thing: thing_proto }
thing_array = [ thing_proto ]

A key thing to note is this quote from Matz a decade ago:

Ruby often has two conversion methods for an object, e.g. #to_s and #to_str, #to_i and #to_int, #to_h and #to_hash. The former is used for explicit conversion, the latter is used for implicit conversion (from an object with identical method signature, for example proxy). Ref: https://bugs.ruby-lang.org/issues/10180

ActiveSupport rightfully makes use of the :to_hash method for implicitly transforming an Object to JSON format. Ref: https://github.com/rails/rails/blame/v7.1.3.4/activesupport/lib/active_support/core_ext/object/json.rb#L58-L66

Unfortunately attempting to call #to_hash as opposed to #to_h fails on protobuf messages. Aliasing #to_h for #to_hash calls seems a sensible option here in the absence of shifting the #to_json method to also make use of #to_hash as Matz implies is correct.

What did you expect to see

First, when calling #to_h expect to see the marshalled hash even when nested. That is:

thing_struct.to_h => { thing: { whatsit: "Something random" } }
thing_hash.to_h => { thing: { whatsit: "Something random" } }

Second, when calling #to_json expect to see the marshalled json, even when nested. That is:

thing_struct.to_json => "{\"thing\":{\"whatsit\":\"Something random\"}}"
thing_hash.to_json => "{\"thing\":{\"whatsit\":\"Something random\"}}"
thing_array.to_json => "[{\"whatsit\":\"Something random\"}]"

What did you see instead?

Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).

When calling #to_h:

thing_struct.to_h => { thing: <Thing: whatsit: "Something Random"> }
thing_hash.to_h => { thing: <Thing: whatsit: "Something Random"> }

When calling #to_json:

thing_struct.to_json => "{\"thing\":{\"descriptor\":[{}]}}"
thing_hash.to_json => "{\"thing\":{\"descriptor\":[{}]}}"
thing_array.to_json => "[{\"descriptor\":[{}]}]"

Anything else we should know about your project / environment

Possibly related to https://github.com/protocolbuffers/protobuf/issues/9500

haberman commented 2 months ago

Hi @jufemaiz, can you clarify exactly what the proposed resolution is here?

Would that change break any existing use cases?

jufemaiz commented 2 months ago

There's two options, with one that I would lean towards due to Matz's language design.

  1. (preferred): protobuf to add #to_hash as the default internally consistent method, but also expose #to_h for direct use (whether this differs is a discussion more broadly if there is need for additional fields visible internalls).
  2. (alternative): protobuf adds #to_hash as an alias of #to_h.

Digging into this further, https://github.com/protocolbuffers/protobuf/blob/0302c4c43821ac893e8f1071576f80edef5c6398/ruby/lib/google/protobuf/ffi/internal/convert.rb#L190-L210 there's already a non-private #to_h implementation (that #to_h sometimes (?) directly calls), however the conventions of ruby are not being adopted, causing issues elsewhere there is an expectation of convention.

haberman commented 2 months ago

Please note that the ffi/ directory is an experimental implementation that is not currently the default. The default implementation is written in C: https://github.com/protocolbuffers/protobuf/blob/8d8db9cea8eb442a57286f33ec1e802c5d6c3149/ruby/ext/google/protobuf_c/message.c#L831-L840

I don't quite understand how options (1) and (2) are different. What's an example of how we might want #to_hash and #to_h to differ?

jufemaiz commented 2 months ago

Ah fair enough on the ffi/ directory.

I am not sure why, but there may be reasons that the core team want a, by convention, internal function to differ from the external function. If there's no reason, aliasing one way or another would resolve the issue – and given the lack of implementation of #to_hash I cannot imagine there would be a major issue in doing so.

haberman commented 2 months ago

Introducing #to_hash as an alias of #to_h seems reasonable to me.

jufemaiz commented 2 months ago

How would you like to proceed on this? Changes to the c code?