neos / neos-development-collection

The unified repository containing the Neos core packages, used for Neos development.
https://www.neos.io/
GNU General Public License v3.0
260 stars 221 forks source link

Discussing serialization formats #5072

Closed kitsunet closed 4 months ago

kitsunet commented 4 months ago

Originally about this: The dimensionspacepoint hash is a unique identifier and points to one exact dimension combination, currently there are no methods to find a DSP by hash but we could make that part of the API and have a way easier time serializing addresses by only ever using the hash instead of a JSONified dictionary with the actual dimension values.

Now discussing Serialization

bwaidelich commented 4 months ago

I don't understand why we should open up that can of worms? What's the major issue with some string representation that can be converted back (e.g. base64encoded JSON)? Introducing a non-reversable hash is a bit like adding an identity to this VO.

BTW: I'm fine with introducing some internal optimization that makes it faster/easier to look up database records, but we should be aware that this adds another level of abstraction/complexity with a source of potential bugs. So we should really measure the benefits

kitsunet commented 4 months ago

we already use it as that anyways as our equals operation compares the two hashes. I think the hash is way easier to handle than JSON (and I still don't know why we would base64 encode it, which was originally created to represent binary data, without knowing where you want to use the serialized format, I would say base64 is premature encoding). Also while longish the hash is a fixed length, and we can always generate it back from all known dimensionspacepoints (which we have) in case there is no easier way to get to it.

mhsdesign commented 4 months ago

I thought about this as well and discussed this shortly with @grebaldi for the Ui but we didnt like that due to not being able to use a pure static call to instantiate / decode it from string. That is a huuuge plus imo and allows us to simplify the mechanism a lot and thus the dx:

$nodeAddress = NodeAddress::fromUriString($encoded);

I compared the current state with all the proposals made:

Dimension Array Neos 8.3 legacy NodeAddress Proposal 1 Proposal 2 Proposal 3 Proposal 4
empty: [] | `W10=` | | ` |d751713988987e9331980363e24189ce|%5B%5D`
['language'=>'de'] language%3Dde eyJsYW5ndWFnZSI6ImRlIn0= bGFuZ3VhZ2U9ZGU~ language%3Dde 1041cc1fe1030c1a82ac24346f8c69a7 %7B%22language%22%3A%22de%22%7D
['language'=>'en_US','audience'=>'nice people'] language%3Den_US%26audience%3Dnice+people eyJsYW5ndWFnZSI6ImVuX1VTIiwiYXVkaWVuY2UiOiJuaWNlIHBlb3BsZSJ9 bGFuZ3VhZ2U9ZW5fVVMmYXVkaWVuY2U9bmljZStwZW9wbGU~ language%3Den_US%26audience%3Dnice%2Bpeople 16cfd41de0dac5d8a66367ff326539d2 %7B%22language%22%3A%22en_US%22%2C%22audience%22%3A%22nice+people%22%7D

My conclusion: Id prefer option 1 as its a nice portable encoding and compared to the md5 for most usecases even smaller.

kitsunet commented 4 months ago

Alright, BUT please lets not do the str_replace thing, I don't want these custom made "encoding" schemes. It's not the problem of the nodeaddress to avoid url characters. Either just http_build_query or just json_encode

mhsdesign commented 4 months ago

I mean the method currently is named DimensionSpacePoint::fromUriRepresentation and thus id say it has business of doing so. And while i know the cr should be agnostic but php is a web language and we would just ease a lot of usecases by providing a reliable serialisation format. And id say no one on the client should need to decode it manually ... and even if someone does ...

But maybe we can get rid of the my added str_replace('=', '~' hack on your proposal 1. Wdyt @grebaldi?

The goal would be that its absolutely safe as ?node={nodeAddress}&foo=bar part.

kitsunet commented 4 months ago

I think as @bwaidelich commented on some PR, level of abstraction is a problem, why should it have a specific UriRepresentation method. to/from Json is just fine, it works, if you want to use it in a URI you need to uriencode it and I would say you have no busines to just string concat a Uri, an dif you use the Uri objects from PSR it will be correctly encoded. Having a single serialization format (JSON) is graet, because you do not need to think long about what to do.

mhsdesign commented 4 months ago

I included Neos 8.3 and your json + urlencode as Proposal 4 in the list above but its definitely the one with the most bloat sofar.

grebaldi commented 4 months ago

My debugging heart tells me that I'd rather not have any obscurity in the URI representation for a node address. So instead of:

?node={nodeAddress}

I'd rather have something like:

?node[id]=0a796b2e-168b-11ef-bbe0-032add0587f5&node[dsp][language]=en_UK&node[dsp][audience]=party-people

And then an easy way to deserialize from array. Yet, as you see from my example, I'm very tempted to shorten the query parameters as much as possible. If we'd choose to do that, we'd end up with a URI-concern leaking into NodeAddress anyway. This could be moved to a dedicated factory though, that would still be able to build a NodeAddress statically - without early access to a CR.

Following the argumentation of @kitsunet, I agree that url_encode(json_encode(...))ing the NodeAddress would be the most straight-forward way of achieving painless serialization/deserialization. There'd be no need of any extra factory and no question about which format to use in any particular case. The format definitely hurts my sense of aesthetics, but I suppose that is not all that relevant :D. I believe, debugging-wise, url_encode(json_encode(...)) would be an okay compromise.

mhsdesign commented 4 months ago

yes i have thought as well of encoding the NodeAddress into the url as array and let the framework handle it. That should work pretty good.

But i think we should still have a simple compact string based format. At least when we @grebaldi rediscussed the serialisation format in the HH sprint (https://github.com/neos/neos-development-collection/issues/4564#issuecomment-1999514362) we also considered that loading, as we do currently, all serialised Nodes into the redux store (in memory) should as well be done carefully and rather compact than bloated json. Though we have not measured performance or heap size and this was just a gut feeling ;)

url_encode(json_encode(...)) is also certainly not my favourite and its super bloated. I do like a nice syntax like:

?node=default/user-mh/$dimensionSpacePointProblem/79e69d1c-b079-4535-8c8a-37e76736c445

But it might be a good start to continue my initial implementation at #4892 and we can still change it but i need an intermediate agreement on one thing and that seems to be common ground now.

mhsdesign commented 4 months ago

Regarding what you @bwaidelich wrote at https://github.com/neos/neos-development-collection/pull/4892#issuecomment-2126556880

Currently u use it in the EventSourcedFrontendNodeRoutePartHandler: return new MatchResult($nodeAddress->toUriString(), ... but it's specifically not about the URI representation in that case but rather the "contextPath 2.0"

In that case we could even return the NodeAddress directly as object match result. Or maybe not because this must be a primitive to be cacheable?

edit: Jip answered that myself:

InvalidRoutePartValueException: RoutePart::getValue() must only return simple types after calling RoutePart::match()

for DimensionSpacePoint and NodeAddress: See example

The idea of base64 decoding the json encode result does struck me a bit too much ^^

JSON: {"contentRepositoryId":"default","workspaceName":"live","dimensionSpacePoint":{"language":"es","country":"ar"},"aggregateId":"node-1"}
serialized: eyJjb250ZW50UmVwb3NpdG9yeUlkIjoiZGVmYXVsdCIsIndvcmtzcGFjZU5hbWUiOiJsaXZlIiwiZGltZW5zaW9uU3BhY2VQb2ludCI6eyJsYW5ndWFnZSI6ImVzIiwiY291bnRyeSI6ImFyIn0sImFnZ3JlZ2F0ZUlkIjoibm9kZS0xIn0=

A single method for (de)serialization might not be sufficient because depending on the use case some of the parts are already defined. E.g. in the Neos backend we already know contentRepositoryId and workspaceName most of the time – I think we discussed this before and agreed to duplicate that information in the query param but I can't recall for certain

We definitely want to duplicate information here. We discussed that falling back to resolving the cr by current host and checking which site it is and then reading the content repository id from configuration (eg what the SiteDetectionResult does) is a bit of a stretch and hard to explain to our users. Its way simpler to have that information also serialised.

and if we do that we should think about the behaviour if SiteDetectionResult and serialised NodeAddress don't match

thats similar to editing the user workspace name to someones other personal one. Url editing is not directly a problem and if this should not be allowed the security framework or some other aspect must prohibit this on another level.

bwaidelich commented 4 months ago

The idea of base64 decoding the json encode result does struck me a bit too much ^^

What do you mean by "struck me too much"? I took that idea from your commit c8246040d2ca54b1a925a4e7ca80cc3f38e4204b – in general I'm not too concerned about what serialization format we use – as long as it's not a hash :)

We definitely want to duplicate information here

Yep, agreed.

thats similar to editing the user workspace name to someones other personal one

Yes, well kinda.. But you're right. If there's a mismatch we can still decide whether to throw, redirect or ignore the case – it's not exactly related to the serialization as long as we decide for a unified format (which I think makes sense)

mhsdesign commented 4 months ago

What do you mean by "struck me too much"?

I base64'd only the dsp part. You encoded everything with base64. The reason for using base64 in the dsp is that it might contain invalid uri characters. All other things like crId or wsName or nodeId are uri safe by constraint :)

kitsunet commented 4 months ago

Few notes here:

serialization != encoding

JSON = serialization http_build_query = serialization (of sorts) phps serialize = serialization something custom ala "contentrepo/wokrspace/nodeId/dimensions also serialization

BUT

base64 urlencode and probably 50 more things all encoding

I think having a serialization format in here is nice but it should not come with an encoding, as THAT is better decided wherever the serialized version is used.

That said some more things:

https://jsonurl.org/ not well adopted but we could look at this? 🤷

I could imagine just letting the routing deal with both serializing and encoding if we wish for more "nice" formats. Akin to the RoutePartHandler we could provide a bit of glue to convert NodeAddress to something we would like for URLs as well (and the opposite).

I am specfiically careful with base64 as there are incompatibilities between implementations despite them all being called base64, I prefer tto avoid it as much as possible. BUT obviosuly it is used if we want to use it I would go bastis route though and encode everything not just select parts.

bwaidelich commented 4 months ago

@kitsunet I am not sure whether the differentiation between serialization and encoding helps our situation Wikipedia (and other sources) explicitly mention URL applications for Base64. But I agree to your concerns and I think @mhsdesign and me just came up with a good "compromise": We add the toJson() method for convenience and because that's a very common representation and we decide in each case what the most suitable "serialization" format could be. For example: the internal contextPath replacement for MatchResult or client-side JS representation could use the JSON format. For the representation within the "Neos Preview/Backend URLs" I like @grebaldi's suggestion (https://github.com/neos/neos-development-collection/issues/5072#issuecomment-2120086565) the best. But IMO the conversion from NodeAddress to that, vice versa, should not be a concern of the NodeAddress nor a CR core concept in general.

WDYT?

kitsunet commented 4 months ago

Yep, sounds all in line with my toughts as well <3

mhsdesign commented 4 months ago

Today @kitsunet @bwaidelich and me discussed that we indeed want to use the fully qualified json format at every place in Neos:

{"contentRepositoryId":"default","workspaceName":"live","dimensionSpacePoint":{"language":"es","country":"ar"},"aggregateId":"node-1"} in url:

?node=%7B%22contentRepositoryId%22%3A%22default%22%2C%22workspaceName%22%3A%22live%22%2C%22dimensionSpacePoint%22%3A%7B%22language%22%3A%22es%22%2C%22country%22%3A%22ar%22%7D%2C%22aggregateId%22%3A%22node-1%22%7D

We discussed the possibility of of leaving out the workspaceName and contentRepositoryId for the neos/content route as these things are known (user ws and content repository by host) but we decided against that as well:

Further we discussed that using a resource like pattern (which is legit seeing nodes as resource) like:

localhost:8080/neos/content/default/live/language=en_US&audience=intelligent+people/995c9174-ddd6-4d5c-cfc0-1ffc82184677

has the downside that it doesnt scale very well with the flow routing framework. We already moved from similar routing away to simply nodes as query string to not stress the router and its cache to much. See https://github.com/neos/neos-development-collection/issues/2653

At last character saving was also discussed and we considered using simple indexes like c for contentRepositoryId and w for workspace, but that should not be part of the core NodeAddress as for example the fromArray behaviour would have to be defined differently to work with the full names ... or not? Either way well just go ahead with this and hope that the Neos Ui doesnt explode. (could possibly be tested with heap size?)

Further i stressed that its important in my perspective to use only one common node serialization format (eg node context path 2.0) and for b/c that should be a string and used absolutely everywhere. That way people when writing say neos ui extension use the same format for their backend controllers than when using a react app with custom endpoint.

Thanks for the extensive discussion to each and everyone. Ill implement the NodeAddress::toJson in here: https://github.com/neos/neos-development-collection/pull/4892

mhsdesign commented 3 months ago

Btw as you @Sebobo and @ahaeslich are about to overhaul the workspace module i just wanted to loop you in here that i would be great if the workspace module could also operate on the above mentioned json node address and not as its currently done with the legacy __ notation. I started to refactor a bit but noticed that this is a bad timing as you might be already adjusting things.

The json format will be available with the merge of https://github.com/neos/neos-development-collection/pull/4892

Sebobo commented 3 months ago

Feel free to adjust whatever now, we will anyway split the conversion of the specific actions into separate PRs and right now I just work on the workspace list.