Closed kitsunet closed 4 months ago
I don't understand why we should open up that can of worms? What's the major issue with some string representation that can be converted back (e.g. base64encoded JSON)? Introducing a non-reversable hash is a bit like adding an identity to this VO.
BTW: I'm fine with introducing some internal optimization that makes it faster/easier to look up database records, but we should be aware that this adds another level of abstraction/complexity with a source of potential bugs. So we should really measure the benefits
we already use it as that anyways as our equals operation compares the two hashes. I think the hash is way easier to handle than JSON (and I still don't know why we would base64 encode it, which was originally created to represent binary data, without knowing where you want to use the serialized format, I would say base64 is premature encoding). Also while longish the hash is a fixed length, and we can always generate it back from all known dimensionspacepoints (which we have) in case there is no easier way to get to it.
I thought about this as well and discussed this shortly with @grebaldi for the Ui but we didnt like that due to not being able to use a pure static call to instantiate / decode it from string. That is a huuuge plus imo and allows us to simplify the mechanism a lot and thus the dx:
$nodeAddress = NodeAddress::fromUriString($encoded);
I compared the current state with all the proposals made:
Dimension Array | Neos 8.3 | legacy NodeAddress | Proposal 1 | Proposal 2 | Proposal 3 | Proposal 4 |
---|---|---|---|---|---|---|
empty: [] |
| `W10=` | | ` | d751713988987e9331980363e24189ce| %5B%5D` |
|||||
['language'=>'de'] |
language%3Dde |
eyJsYW5ndWFnZSI6ImRlIn0= |
bGFuZ3VhZ2U9ZGU~ |
language%3Dde |
1041cc1fe1030c1a82ac24346f8c69a7 |
%7B%22language%22%3A%22de%22%7D |
['language'=>'en_US','audience'=>'nice people'] |
language%3Den_US%26audience%3Dnice+people |
eyJsYW5ndWFnZSI6ImVuX1VTIiwiYXVkaWVuY2UiOiJuaWNlIHBlb3BsZSJ9 |
bGFuZ3VhZ2U9ZW5fVVMmYXVkaWVuY2U9bmljZStwZW9wbGU~ |
language%3Den_US%26audience%3Dnice%2Bpeople |
16cfd41de0dac5d8a66367ff326539d2 |
%7B%22language%22%3A%22en_US%22%2C%22audience%22%3A%22nice+people%22%7D |
legacy NodeAddress base64_encode(json_encode($coordinates))
-> current state, but more inefficient due to the use of json
-> 'default' empty case confusing? W10=
Proposal 1) str_replace('=', '~', base64_encode(http_build_query($coordinates)));
From @grebaldi and me see https://github.com/neos/neos-development-collection/issues/4564#issuecomment-1999514362 and implemented at https://github.com/neos/neos-development-collection/pull/5067
-> goal make the current legacy NodeAddress
smaller, currently space inefficient.
-> discussion points, how to handle the base64 added =
padding (same problem with current nodeAddress)
Proposal 2) urlencode(http_build_query($coordinates));
From @kitsunet https://github.com/neos/neos-development-collection/issues/4564#issuecomment-1999579266 altered version from 1 using urlencode
-> problem double encoding can be confusing
Proposal 3) md5(json_encode($coordinates));
From @kitsunet using hash
Proposal 4) urlencode(json_encode($coordinates));
From @kitsunet using json and letting the frame work auto encode it :)
My conclusion: Id prefer option 1 as its a nice portable encoding and compared to the md5 for most usecases even smaller.
Alright, BUT please lets not do the str_replace thing, I don't want these custom made "encoding" schemes. It's not the problem of the nodeaddress to avoid url characters. Either just http_build_query or just json_encode
I mean the method currently is named DimensionSpacePoint::fromUriRepresentation
and thus id say it has business of doing so. And while i know the cr should be agnostic but php is a web language and we would just ease a lot of usecases by providing a reliable serialisation format. And id say no one on the client should need to decode it manually ... and even if someone does ...
But maybe we can get rid of the my added str_replace('=', '~'
hack on your proposal 1. Wdyt @grebaldi?
The goal would be that its absolutely safe as ?node={nodeAddress}&foo=bar
part.
I think as @bwaidelich commented on some PR, level of abstraction is a problem, why should it have a specific UriRepresentation method. to/from Json is just fine, it works, if you want to use it in a URI you need to uriencode it and I would say you have no busines to just string concat a Uri, an dif you use the Uri objects from PSR it will be correctly encoded. Having a single serialization format (JSON) is graet, because you do not need to think long about what to do.
I included Neos 8.3 and your json + urlencode as Proposal 4 in the list above but its definitely the one with the most bloat sofar.
My debugging heart tells me that I'd rather not have any obscurity in the URI representation for a node address. So instead of:
?node={nodeAddress}
I'd rather have something like:
?node[id]=0a796b2e-168b-11ef-bbe0-032add0587f5&node[dsp][language]=en_UK&node[dsp][audience]=party-people
And then an easy way to deserialize from array. Yet, as you see from my example, I'm very tempted to shorten the query parameters as much as possible. If we'd choose to do that, we'd end up with a URI-concern leaking into NodeAddress
anyway. This could be moved to a dedicated factory though, that would still be able to build a NodeAddress
statically - without early access to a CR.
Following the argumentation of @kitsunet, I agree that url_encode(json_encode(...))
ing the NodeAddress
would be the most straight-forward way of achieving painless serialization/deserialization. There'd be no need of any extra factory and no question about which format to use in any particular case. The format definitely hurts my sense of aesthetics, but I suppose that is not all that relevant :D. I believe, debugging-wise, url_encode(json_encode(...))
would be an okay compromise.
yes i have thought as well of encoding the NodeAddress
into the url as array and let the framework handle it. That should work pretty good.
But i think we should still have a simple compact string based format. At least when we @grebaldi rediscussed the serialisation format in the HH sprint (https://github.com/neos/neos-development-collection/issues/4564#issuecomment-1999514362) we also considered that loading, as we do currently, all serialised Nodes into the redux store (in memory) should as well be done carefully and rather compact than bloated json. Though we have not measured performance or heap size and this was just a gut feeling ;)
url_encode(json_encode(...))
is also certainly not my favourite and its super bloated. I do like a nice syntax like:
?node=default/user-mh/$dimensionSpacePointProblem/79e69d1c-b079-4535-8c8a-37e76736c445
But it might be a good start to continue my initial implementation at #4892 and we can still change it but i need an intermediate agreement on one thing and that seems to be common ground now.
Regarding what you @bwaidelich wrote at https://github.com/neos/neos-development-collection/pull/4892#issuecomment-2126556880
Currently u use it in the
EventSourcedFrontendNodeRoutePartHandler
:return new MatchResult($nodeAddress->toUriString(), ...
but it's specifically not about the URI representation in that case but rather the "contextPath 2.0"
In that case we could even return the NodeAddress
directly as object match result. Or maybe not because this must be a primitive to be cacheable?
edit: Jip answered that myself:
InvalidRoutePartValueException: RoutePart::getValue() must only return simple types after calling RoutePart::match()
for
DimensionSpacePoint
andNodeAddress
: See example
The idea of base64 decoding the json encode result does struck me a bit too much ^^
JSON: {"contentRepositoryId":"default","workspaceName":"live","dimensionSpacePoint":{"language":"es","country":"ar"},"aggregateId":"node-1"}
serialized: eyJjb250ZW50UmVwb3NpdG9yeUlkIjoiZGVmYXVsdCIsIndvcmtzcGFjZU5hbWUiOiJsaXZlIiwiZGltZW5zaW9uU3BhY2VQb2ludCI6eyJsYW5ndWFnZSI6ImVzIiwiY291bnRyeSI6ImFyIn0sImFnZ3JlZ2F0ZUlkIjoibm9kZS0xIn0=
A single method for (de)serialization might not be sufficient because depending on the use case some of the parts are already defined. E.g. in the Neos backend we already know
contentRepositoryId
andworkspaceName
most of the time – I think we discussed this before and agreed to duplicate that information in the query param but I can't recall for certain
We definitely want to duplicate information here. We discussed that falling back to resolving the cr by current host and checking which site it is and then reading the content repository id from configuration (eg what the SiteDetectionResult
does) is a bit of a stretch and hard to explain to our users. Its way simpler to have that information also serialised.
and if we do that we should think about the behaviour if
SiteDetectionResult
and serialised NodeAddress don't match
thats similar to editing the user workspace name to someones other personal one. Url editing is not directly a problem and if this should not be allowed the security framework or some other aspect must prohibit this on another level.
The idea of base64 decoding the json encode result does struck me a bit too much ^^
What do you mean by "struck me too much"? I took that idea from your commit c8246040d2ca54b1a925a4e7ca80cc3f38e4204b – in general I'm not too concerned about what serialization format we use – as long as it's not a hash :)
We definitely want to duplicate information here
Yep, agreed.
thats similar to editing the user workspace name to someones other personal one
Yes, well kinda.. But you're right. If there's a mismatch we can still decide whether to throw, redirect or ignore the case – it's not exactly related to the serialization as long as we decide for a unified format (which I think makes sense)
What do you mean by "struck me too much"?
I base64'd only the dsp part. You encoded everything with base64. The reason for using base64 in the dsp is that it might contain invalid uri characters. All other things like crId or wsName or nodeId are uri safe by constraint :)
Few notes here:
serialization != encoding
JSON = serialization http_build_query = serialization (of sorts) phps serialize = serialization something custom ala "contentrepo/wokrspace/nodeId/dimensions also serialization
BUT
base64 urlencode and probably 50 more things all encoding
I think having a serialization format in here is nice but it should not come with an encoding, as THAT is better decided wherever the serialized version is used.
That said some more things:
https://jsonurl.org/ not well adopted but we could look at this? 🤷
I could imagine just letting the routing deal with both serializing and encoding if we wish for more "nice" formats. Akin to the RoutePartHandler we could provide a bit of glue to convert NodeAddress to something we would like for URLs as well (and the opposite).
I am specfiically careful with base64 as there are incompatibilities between implementations despite them all being called base64, I prefer tto avoid it as much as possible. BUT obviosuly it is used if we want to use it I would go bastis route though and encode everything not just select parts.
@kitsunet I am not sure whether the differentiation between serialization and encoding helps our situation
Wikipedia (and other sources) explicitly mention URL applications for Base64.
But I agree to your concerns and I think @mhsdesign and me just came up with a good "compromise":
We add the toJson() method for convenience and because that's a very common representation and we decide in each case what the most suitable "serialization" format could be.
For example: the internal contextPath
replacement for MatchResult
or client-side JS representation could use the JSON format.
For the representation within the "Neos Preview/Backend URLs" I like @grebaldi's suggestion (https://github.com/neos/neos-development-collection/issues/5072#issuecomment-2120086565) the best. But IMO the conversion from NodeAddress
to that, vice versa, should not be a concern of the NodeAddress
nor a CR core concept in general.
WDYT?
Yep, sounds all in line with my toughts as well <3
Today @kitsunet @bwaidelich and me discussed that we indeed want to use the fully qualified json format at every place in Neos:
{"contentRepositoryId":"default","workspaceName":"live","dimensionSpacePoint":{"language":"es","country":"ar"},"aggregateId":"node-1"}
in url:?node=%7B%22contentRepositoryId%22%3A%22default%22%2C%22workspaceName%22%3A%22live%22%2C%22dimensionSpacePoint%22%3A%7B%22language%22%3A%22es%22%2C%22country%22%3A%22ar%22%7D%2C%22aggregateId%22%3A%22node-1%22%7D
We discussed the possibility of of leaving out the workspaceName and contentRepositoryId for the neos/content
route as these things are known (user ws and content repository by host) but we decided against that as well:
Further we discussed that using a resource like pattern (which is legit seeing nodes as resource) like:
localhost:8080/neos/content/default/live/language=en_US&audience=intelligent+people/995c9174-ddd6-4d5c-cfc0-1ffc82184677
has the downside that it doesnt scale very well with the flow routing framework. We already moved from similar routing away to simply nodes as query string to not stress the router and its cache to much. See https://github.com/neos/neos-development-collection/issues/2653
At last character saving was also discussed and we considered using simple indexes like c for contentRepositoryId and w for workspace, but that should not be part of the core NodeAddress
as for example the fromArray behaviour would have to be defined differently to work with the full names ... or not?
Either way well just go ahead with this and hope that the Neos Ui doesnt explode. (could possibly be tested with heap size?)
Further i stressed that its important in my perspective to use only one common node serialization format (eg node context path 2.0) and for b/c that should be a string and used absolutely everywhere. That way people when writing say neos ui extension use the same format for their backend controllers than when using a react app with custom endpoint.
Thanks for the extensive discussion to each and everyone.
Ill implement the NodeAddress::toJson
in here: https://github.com/neos/neos-development-collection/pull/4892
Btw as you @Sebobo and @ahaeslich are about to overhaul the workspace module i just wanted to loop you in here that i would be great if the workspace module could also operate on the above mentioned json node address and not as its currently done with the legacy __ notation. I started to refactor a bit but noticed that this is a bad timing as you might be already adjusting things.
The json format will be available with the merge of https://github.com/neos/neos-development-collection/pull/4892
Feel free to adjust whatever now, we will anyway split the conversion of the specific actions into separate PRs and right now I just work on the workspace list.
Originally about this: The dimensionspacepoint hash is a unique identifier and points to one exact dimension combination, currently there are no methods to find a DSP by hash but we could make that part of the API and have a way easier time serializing addresses by only ever using the hash instead of a JSONified dictionary with the actual dimension values.
Now discussing Serialization