netgen / ezplatform-site-api

Netgen's Site API is site building productivity layer for eZ/Ibexa Platform
https://docs.netgen.io/projects/site-api
GNU General Public License v2.0
37 stars 8 forks source link

[3.7] NGSTACK-449: implement configurable cross-siteaccess router #185

Closed pspanja closed 1 year ago

pspanja commented 4 years ago

This implements per-siteaccess configurable cross-siteaccess UrlAliasRouter.

The router can be configured whether cross-siteaccess routing will be used with ng_cross_siteaccess_routing (showing defaults):

ezpublish:
    system:
        frontend:
            ng_cross_siteaccess_routing: true
            ng_cross_siteaccess_routing_prefer_translation_siteaccess: false
            ng_cross_siteaccess_routing_external_subtree_roots: []

If cross-siteaccess routing is enabled, as a first step a siteaccess will be chosen from all siteaccesses that have the given Location under it's tree root, configured with content.tree_root.location_id option. If the current siteaccess is among those, it will be used for matching, otherwise the best matching siteaccess will be resolved from the current siteaccess' prioritized languages and available siteaccesses prioritized languages. If none matches, the siteaccesses will be checked if they allow the given Location in regard to the available languages and always available flag. They will be checked as they are listed in the configuration, meaning their order is significant.

If ng_cross_siteaccess_routing_prefer_translation_siteaccess options is set to true, for the matched siteaccess the end siteaccess will be matched from the translation_siteaccesses configuration. Again, current siteaccess prioritized languages will be used to determine best matching translation siteaccess, otherwise they will be checked in the order they are configured. Note that translation siteaccess will be checked only for it's most prioritized language.

In case none of the above matches, current siteaccess is used as a fallback.

External (for the current siteaccess tree root) subtreess, configured for the matching side with "exclude path prefix" option, are here on the generating side handled by ng_cross_siteaccess_routing_external_subtree_roots option. Location that is found to be under the configured external subtree root will be always handled in the context of the current siteaccess.

If the end result of resolving a siteaccess is a non-current siteaccess, the router will generate an absolute path, including the host. That means such path will be returned from both url and path Twig functions. However, the resulting URL will be checked if it matches the current request host and is not generated as absolute. In that case, the host part will be removed, and the absolute URL will be generated only when that's absolutely needed.

Since UrlAliasRouter now has to handle Content and Location objects as well, previously separate ContentUrlAliasRouter and LocationUrlAliasRouter have been merged into it.

Note

The SiteaccessResolver service is aggresively cached in-memory, since it must be performant when handling hundreds and even thousands of links. For that reason the code there is not very pretty and might be a bit dense to read.

To do

hknezevic commented 4 years ago

@pspanja as discussed internally, let's include the translation_siteaccess logic into the router as well. Example:

Let's look at the following siteaccess configuration snippet:

Screenshot 2020-10-18 at 15 26 08

All three siteaccesses point to the same root location ID, and we have the translation_siteaccess list defined where we cover all languages.

Let's say we are on the german website. If we render the URL to a content that only has english translation, and there is a translation siteaccess for an english language, the URL should point to the translation siteaccess instead of the current one. If no translation siteaccesses are defined, we stay in the context of the current siteaccess (default behavior)

pspanja commented 4 years ago

OK, to do added 👍🏼

pspanja commented 4 years ago

Implemented handling of translation siteaccesses and updated description, please re-review.

While reviewing do check the tests cases, they were written to mimic the siteaccess config and describe the behaviour in much more detail than can be documented here in short.

Test failures are caused by Travis updating composer to 2.0, this seems to be something that should be fixed in eZ kernel.

pspanja commented 4 years ago

Also, cross-siteaccess routing is now enabled by default.

pspanja commented 4 years ago

Do not review yet, this needs an update for choosing the siteaccess by the best matching translation, following the discussion with @vjeran and @ivrdoljak.

@hknezevic (not related to the above):

Let's say we are on the german website. If we render the URL to a content that only has english translation, and there is a translation siteaccess for an english language, the URL should point to the translation siteaccess instead of the current one. If no translation siteaccesses are defined, we stay in the context of the current siteaccess (default behavior)

In the example you gave, this would change the basic language fallback behaviour; even if the Location URL could be generated on ger, we would generate it on eng. While it can be controlled by translation_siteaccesses config, I'm not sure it will not interfere with its use for the language switcher. Do we really want such switching between siteaccess in all cases?

pspanja commented 4 years ago

Here are some use cases for which I would like your feedback.

Question: In the siteaccess three, for the Location in subtree 100 and the language ita-IT, for which siteaccess URL should be generated for and why?

Case 1

one:
    content:
        tree_root:
            location_id: 100
    languages:
        - fre-FR
        - ita-IT
        - cro-HR
two:
    content:
        tree_root:
            location_id: 100
    languages:
        - cro-HR
        - fre-FR
        - ita-IT
three:
    content:
        tree_root:
            location_id: 200
    languages:
        - eng-GB
        - ger-DE

Case 2

one:
    content:
        tree_root:
            location_id: 100
    languages:
        - ita-IT
        - eng-GB
two:
    content:
        tree_root:
            location_id: 100
    languages:
        - eng-GB
        - ita-IT
three:
    content:
        tree_root:
            location_id: 200
    languages:
        - eng-GB
        - ger-DE

Case 3

one:
    content:
        tree_root:
            location_id: 100
    languages:
        - nor-NO
        - fre-FR
        - ger-DE
two:
    content:
        tree_root:
            location_id: 100
    languages:
        - fre-FR
        - nor-NO
        - eng-GB
three:
    content:
        tree_root:
            location_id: 200
    languages:
        - eng-GB
        - ger-DE
hknezevic commented 4 years ago

Do not review yet, this needs an update for choosing the siteaccess by the best matching translation, following the discussion with @vjeran and @ivrdoljak.

@hknezevic (not related to the above):

Let's say we are on the german website. If we render the URL to a content that only has english translation, and there is a translation siteaccess for an english language, the URL should point to the translation siteaccess instead of the current one. If no translation siteaccesses are defined, we stay in the context of the current siteaccess (default behavior)

In the example you gave, this would change the basic language fallback behaviour; even if the Location URL could be generated on ger, we would generate it on eng. While it can be controlled by translation_siteaccesses config, I'm not sure it will not interfere with its use for the language switcher. Do we really want such switching between siteaccess in all cases?

I don't see why it would interfere with anything.

This was the showcase of the situation when we look at the content in the context of current subtree and current siteaccess. Translation siteaccess list has a sole purpose: if the content is multi-language, to provide means of generating URLs for different translations in different siteaccess context. If it is not defined for the current matching siteaccess or siteaccess group, then the prioritized language list is applied and we will stay in the context of the current siteaccess.

Same goes if we have a content translation that matches none of the primary languages in any of the translation siteaccesses; in this case the URL is generated in the current siteaccess context, given that the language is available in the prioritized language list of the current siteaccess.

pspanja commented 4 years ago

This was the showcase of the situation when we look at the content in the context of current subtree and current siteaccess. Translation siteaccess list has a sole purpose: if the content is multi-language, to provide means of generating URLs for different translations in different siteaccess context. If it is not defined for the current matching siteaccess or siteaccess group, then the prioritized language list is applied and we will stay in the context of the current siteaccess.

My concern is the following: URL for Location with eng-GB on ger siteaccess will be generated for eng siteaccess even if eng-GB is allowed on ger. So the siteaccess will be switched, and from my POV user wants to view this Location on ger in eng-GB as an exception, and stay on ger for the rest of the content as that's his primary language.

But completely OK with me if that is desired and we can control it as needed.

hknezevic commented 4 years ago

This was the showcase of the situation when we look at the content in the context of current subtree and current siteaccess. Translation siteaccess list has a sole purpose: if the content is multi-language, to provide means of generating URLs for different translations in different siteaccess context. If it is not defined for the current matching siteaccess or siteaccess group, then the prioritized language list is applied and we will stay in the context of the current siteaccess.

My concern is the following: URL for Location with eng-GB on ger siteaccess will be generated for eng siteaccess even if eng-GB is allowed on ger. So the siteaccess will be switched, and from my POV user want to view this Location on ger in eng-GB as an exception, and stay on ger for the rest of the content as that's his primary language.

But completely OK with me if that is desired and we can control it as needed.

I agree, but the moment you define the translation siteaccess with the matching language, you already make decision that some particular language has its own siteaccess context. The fallback language can be used the purposes other than just translating the content, for example, when you want to be able to load some media assets where the language doesn't matter (like images), and in cases where for some reason you don't want to apply the "always available" flag on the content.

The situation where we have two siteaccesses pointing to the exact same subtree and with the same language lists (with only the order of the languages inverted) is highly unlikely to happen, unless they are connected as a translation siteaccesses. And even then, we always pick one language and one siteaccess as a general fallback if nothing matches, and all the other siteaccesses have their own prioritized language.

I am also looking at this from the SEO perspective: we usually utilise the translation siteaccess list for another purpose, which is to generate meta alternate/hreflang URLs, describing that the same content exists in different languages on the defined URLs for the search robots. You either show the multi-language content in the same siteaccess, following the translation fallback, and do not have the distinct language variant URLs, or you have translation siteacceses defined and provide distinct different language variant URLs.

pspanja commented 4 years ago

OK, clear 👍

hknezevic commented 4 years ago

@pspanja comment on the use cases you provided above:

For the sake of simplicity, I would go with the first siteaccess that matches the content subtree and which the non-system URL reference can be generated for.

For example, have a look what at this code snippet, where we generate a list of available content translations for the language switcher:

https://github.com/netgen/media-site/blob/1.9/src/AppBundle/Resources/views/themes/app/pagelayout_variables.html.twig#L51

If you can generate the non-empty route reference in the context of another siteaccess, and if it is possible to generate the URL for this route which is not a generic system URL (%/view/content/%), then use this siteaccess for generating the URL.

All of the listed examples are edge cases which complicate the matter (for example, we do not know if the content has "always available" flag set or not). But, if we wanted to fine-tune the system to cover all possible cases and scenarios, we could solve this through further configuration options - selectively enabling/disabling cross-linking for particular siteaccess or siteaccess group, or restricting the system in a way that it only allows one-to-one siteaccess for the cross-links.

For example, we always cross-link to a single siteaccess (first matched or manually defined through the configuration) with subtree location 100, if we are generating link outside of that particular subtree. Then, we can go even deeper, and if the matched siteaccess by subtree has a translation siteaccesses list, we can try to find the correct language siteaccess using the logic that you already implemented for this PR :)

Oh, and case 3 makes no sense at all :). None of the siteaccesses have ita-IT as available language, so it wouldn't really matter which one we pick - either the content is always available so it will be available in any of them, or it is not marked as always available, in which case we would not even be able to load the location/content in the context of the siteaccess "three", correct?

pspanja commented 4 years ago

The case are a bit on the edge site, but they describe some important behaviour, for example in case 1 we need to decide how to select best matching siteaccess with regard to the Location's translations (is the language list important aside from the most prioritized language?), and in case 2 we need to decide whether to link to a siteaccess one that best matches the available Location translation, or to a siteaccess two where this language can be shown and it's not the most prioritized language, but the siteaccess has the same most prioritized language as the source siteaccess. In case 3 we can select the best matching siteaccess from the source siteaccess prioritized languages (again not only on the most prioritized one).

Always available also needs to be handled, but that's the fallback case as it can be generated for any siteaccess matching the subtree. I mostly already solved the mechanics, just want to confirm that my reasoning about this is intuitive.

pspanja commented 4 years ago

For example, have a look what at this code snippet, where we generate a list of available content translations for the language switcher:

Thanks for this, I never looked much into those parts.

pspanja commented 4 years ago

This is again ready for the review, please check the test cases if the code is too dense to read: https://github.com/netgen/ezplatform-site-api/blob/ecad431e9cfbb9ee3c3c951a3d9c128267894d2a/tests/bundle/Routing/SiteaccessResolverTest.php#L27

pspanja commented 4 years ago

(Description updated)

pspanja commented 4 years ago

Added documentation.

pspanja commented 1 year ago

This feature will be implemented in version 5.1 (https://github.com/netgen/ibexa-site-api).