joomla / joomla-cms

Home of the Joomla! Content Management System
https://www.joomla.org
GNU General Public License v2.0
4.79k stars 3.66k forks source link

Support for case insensitive url #38257

Open Fedik opened 2 years ago

Fedik commented 2 years ago

Is your feature request related to a problem? Please describe.

Please read more comments in related closed PRs #38145 #38249

Possible use case that URL can be published somwhere as EXAMPLE.COM/FOOBAR/PAGE instead of example.com/foobar/page (PDF, Flyers, Promotions, etc)

Describe the solution you'd like

There 2 possible solution: 1 Accept both URL, and add rel="canonical" example.com/foobar/page on the page . Kind of:

$doc->setCanonical($componentRouter->getCanonical());

2 Always redirect to example.com/foobar/page. Kind of:

if ($requestUri !== $componentRouter->getCanonical()) {
  $app->redirect($componentRouter->getCanonical(), 301);
}
Webdongle commented 2 years ago

Get another PR so we can test please

weeblr commented 2 years ago

@Fedik I suggest doing a proper redirect, attaching to the router just like other poarseRules. I have done POC locally but I'm not able to do a PR with the new PSR-12 style and stuff, at least right now. Here is the code:

in libraries/src/Router/SiteRouter.php, in constructor:

$this->attachParseRule(array($this, 'parseInit'), self::PROCESS_BEFORE);
$this->attachParseRule(array($this, 'redirectForCase'), self::PROCESS_BEFORE);

Then add:

    /**
     * Redirect to lowercased version of path.
     *
     * @param   Router  &$router  Router object
     * @param   Uri     &$uri     URI object to process
     *
     * @return  void
     *
     * @since   __DEPLOY_VERSION__
     */
    public function redirectForCase(&$router, &$uri)
    {
        if($this->app->input->getMethod() !== 'GET') {
            return;
        }

        if(empty($_SERVER['HTTP_X_REQUESTED_WITH'])
                        ||
                        (
                            !empty($_SERVER['HTTP_X_REQUESTED_WITH'])
                            &&
                            strtolower($_SERVER['HTTP_X_REQUESTED_WITH']) != 'xmlhttprequest'
                        )
        ) {
            return;
        }

        $lowercasePath = $uri->getPath() ? StringHelper::strtolower($uri->getPath()) : false;
        if (!empty($lowercasePath) && $lowercasePath !== $uri->getPath()) {
            $uri->setPath($lowercasePath);
            $this->app->redirect($uri, 301);
        }
    }

Also requires adding use clause at the top:

use Joomla\String\StringHelper;
weeblr commented 2 years ago

@Fedik Note that I suppressed the redirect for ajax request as I usually do for general redirects, but in fact this may not be needed here as the condition for redirecting is only the case difference, so this may not cause any issue and could be removed.

Fedik commented 2 years ago

hmhm, I think if do redirect, then it should be outside of Router, but near it (before or after at App routing stage). Someone may use $router->parse('/foo/bar') in its code, or in code test. Maybe adding cannonical, also not bad idea, just need to decide which URL is cannonical :)

weeblr commented 2 years ago

Hi Fedik,

I think the router is definitely the best place as it was designed for that. For instance, the parseRule system in the router is what Joomla uses to redirect to admin or frontend SSL when configured to.

Maybe adding cannonical, also not bad idea, just need to decide which URL is cannonical :)

Doing a canonical or a redirect is the exact same in terms of target URL. Simply, from an SEO standpoint, the signal to search engines is much stronger with a redirect than it is with a canonical. Google will sometimes not take into account a canonical, if it goes counter to other signals (internal and external linking for instance, or sitemap presence), while an actual redirect is usually taken into account.

TLWebdesign commented 2 years ago

Hi, i was just wondering what the status is on this? Would be nice to see this in 4.2.4 perhaps? 😇

brianteeman commented 2 years ago

This is hitting me really hard today with a site that began as wordpress 10 years ago before being migrated to joomla 1.x and then upgraded successfully to every version since. Now with j4 changing the behaviour all of the internal links are wrong :(

Fedik commented 2 years ago

For now, as workaround, you can make own plugin with code from PR #38249

brianteeman commented 1 year ago

Hit again by this bug. Can't believe I am the only one.

tbbjr commented 1 year ago

@brianteeman I am a bit late to this party but you my friend are not alone.

We are a small development company with a few hundred J3 sites that are slowly migrating to J4 and this issue is a nightmare with previously working rules. I have been developing websites since the mid 90's and url structures have always been to force lowercase for seo and structure...

With the arguments listed above (i.e. mysite.com/My-File/link.html should/could navigate to a different url than mysite.com/my-file/link.html) is complete nonsense IMHO.

Thats like saying MySiteToday.com should direct to a completely different url than mysitetoday.com... Try that with your registrar and see what they say.

This is a bug that needs to be addressed... Either by fixing the SEF or making canonicals work by default in J4 (like other CMS do). I am confident the Joomla can do better than the WP world... right?

sitecode commented 1 year ago

Unbelievable to allow this massive breaking change without an alternative native fix. What is the non hack fix to revert to J3 url behavior?

rhellyer commented 11 months ago

We have also hit this b/c issue on our site and it affects thousands of existing urls that contain upper case in the path. These worked fine in j3 but not at all in j4.

We can force a redirect 301 but this would surely have a negative effect on the page rank we believe (some of our pages are top ranked in popular searches).

It would be acceptable for us if an option were provided that would all us to configure the j4 router to accommodate upper case and force it to lowercase for the parsing to find the matching menu alias without forcing a redirect or canonical

We noted the option of a custom plugin and the code provided in a prior git issue, but it is our understanding that this would force an external redirect, which we are loth to do.

rhellyer commented 11 months ago

I would strongly support the first of the two options presented by Fedik - "1. Accept both URL, and add rel='canonical' example.com/foobar/page on the page"

With respect to the second alternative. I've been following the discussions on this issue closely and appreciate all the insights shared. However, I'm still unclear on a few points regarding the proposed solution of using a 301 redirect. I want to ensure our team fully understands the implications, particularly concerning potential disruptions to our page ranking.

Case Insensitive Match: Why not simply modify the parseSefRoute() method to perform a case-insensitive match? This would avoid the need for redirection or canonicalization.

Reverting to Joomla 3 Behavior: How can we adjust Joomla 4's behavior on our site to match the case-insensitive URL handling of Joomla 3? Our URLs are in the form example.com/Essays/foobar due to backward compatibility with a previous site version.

I understand that the URL specification allows for example.com/Essays/foo and example.com/essays/foo to serve different content. However, Joomla has historically treated these as the same. I also understand that serving the same content from different URLs is ideally resolved through a permanent redirect or a canonical tag. But isn't this already a common scenario with Joomla URLs? Shouldn't the choice of how to handle this be left to the user?

I'm concerned about the potential impact of 301 redirects on the page ranking of longstanding URLs. Some comments suggest this might not be an issue, but we're worried about the discrepancy between the new target URL and existing links.

As we prepare to transition to Joomla 4, we're trying to find the best solution. It seems our options are limited to modifying the parseSefRoute()method or implementing redirects (either permanently or temporarily) from /Essay/foo paths to /essay/foo. We could do this via the .htaccess file or a system plugin. Are there any other alternatives for us and other users facing the same issue, ideally ones that avoid a redirect?

rhellyer commented 11 months ago

With respect to proposal @Fedik and comment @weeblr - would a user-friendly option be to adopt recommendation 1. (rel='canonical') but provide a config option for 2. (301 redirect)?