solariumphp / solarium

PHP Solr client library
Other
932 stars 301 forks source link

Different behaviours for XML and JSON formatted update queries: stringable objects as properties of `Update\Query\Document` #1126

Closed MalteWunsch closed 3 months ago

MalteWunsch commented 3 months ago

Solarium version(s) affected: probably 6.2.8, confirmed in 6.3.0 and 6.3.5 Solr version: 7.7.3 Solr mode: standalone

Description With XML formatted update requests, Solarium\QueryType\Update\Query\Document objects with stringable objects set as property values get indexed in Solr as if the stringable objects are cast to string.

But with JSON formatted update requests, the document in the Solr index does has a "name" field at all (not even with a blank value).

How to reproduce In the following example, the Address class is stringable:

class Address {
  public function __toString() { return 'My address'; }
}

class UpdateSolrTask {
  public function execute() {
    $client = new Solarium\Client(...);
    $updateQuery = $client->createUpdate();

    $document = $updateQuery->createDocument();
    $document->address = new Address();

    $updateQuery->addDocument($document);
    $updateQuery->addCommit();

    $client->update($update);
  }
}

With the code above, as of solarium/solarium 6.3.0 (having JSON formatted update requests as default), the address field won't show up in the document in the Solr index at all.

If you add $updateQuery->setRequestFormat($updateQuery::REQUEST_FORMAT_XML);, it will show up with the value My address.

Possible Solution Rough ideas:

thomascorthals commented 3 months ago

Great catch. And an easy fix.

What do you think we should do if an object is both Stringable and JsonSerializable? The current implementation implicitly 'prefers' jsonSerialize() (by virtue of ignoring __toString()). So I honoured that in my PR. But maybe it makes more sense to prefer __toString() in this case because it's consistent with XML and any future request formats (e.g. CBOR).

MalteWunsch commented 3 months ago

Wow Thomas, thank you very much for your swift response and fix!

The preference seems to be a tricky question. In the above, I've argued for consistency. Some minutes ago I thought "if you choose JSON formatted update queries, you're likely to expect jsonSerialize() to be used intuitively". But now I think that using JSON probably is no conscious decision (as it is the default), so there might be no intuitive expectation to meet.

My thoughts end up with a slight tendency for preferring __toString() consistently in the long run. But I don't think this is enough to justify changing the current behaviour for now. So I suggest to keep your PR as it is, and maybe keep that question at the back of your mind for when you're about to implement other request formats or think about breaking changes for a next major version.

Thanks again and have a great week!

thomascorthals commented 3 months ago

If you really want __toString() to take precedence, you could of course always do:

$document->address = (string) (new Address());

On the other hand, the reverse also holds:

$document->address = (new Address())->jsonSerialize();

The current behaviour were jsonSerialize() takes precedence is a BC break if you were used to __toString() kicking in with XML.

What do you think, @mkalkbrenner? Might be relevant to consider that I have been working on CBOR requests already so the consideration of "other request formats" is not merely academical.

thomascorthals commented 3 months ago

I went ahead and committed a version that always prefers Stringable, with a note in the documentation about calling jsonSerialize() explicitly and a unit test to ensure this actually works. Did it as a separate commit so it can be reverted before merging the PR if the other way is deemed better.

MalteWunsch commented 3 months ago

Thank you very much, @thomascorthals and @mkalkbrenner 😃