webdevops / TYPO3-metaseo

TYPO3 MetaSEO Extension
https://typo3.org/extensions/repository/view/metaseo
GNU General Public License v3.0
38 stars 24 forks source link

URLs shortened in Sitemap #357

Closed MichiBeck closed 7 years ago

MichiBeck commented 7 years ago

Hi there,

in sitemap it cuts the pagenames in a strange way (see image), can anyone reproduce that issue? Is that a RealUrl-config problem?

metaseo sitemap problem

Thank you very much.

MetaSEO version: 2.0.3

TYPO3 version: 7.6.16

PHP version: 5.6.21

RealUrl version: 2.1.9

thomaszbz commented 7 years ago

Could you please give us an example which illustrates what exactly is wrong here?

MichiBeck commented 7 years ago

Sure - it crops the pagetitles like shown above. here is the pagetree in TYPO3 BE: sitemap cropped If you need further Information - pls let me know.

thomaszbz commented 7 years ago

I've never seen that before. Maybe the slashes (/) in the page names play a role here.

Could you please drop one of the entries in metaseo's sitemap tool and see if and how it is regenerated?

MichiBeck commented 7 years ago

What do you mean with:

drop one of the entries in metaseo's sitemap tool

sorry

thomaszbz commented 7 years ago

Delete one of the entries. It will be regenerated after visiting the page (when logged out from the backend). You've already been at the right place (your first screen shot).

MichiBeck commented 7 years ago

AH! Done - still the same problem:

regenerate sitemap

I tried it with Page "Vordächer / Carports" PID:34

thomaszbz commented 7 years ago

Please try to delete all entries with PID=34. Clear caches afterwards and report back what happened after regeneration of the sitemap entry/entries.

If the problem still persists, please give us a list of all extensions which where activated on top of the core extensions which are enabled by default (fresh installation). That should include manually activated core extensions and all third party extensions.

Did you upgrade from RealURL 1.x to 2.1? Eventually, you need to clean up your realurl tables (I did that a few times using SQL directly).

For the moment I can say that this problem is completely new to me in respect to metaseo 2.0.3 (which got released months ago). I would have expected that a lot of users would have complained if that was a general problem in metaseo.

I guess something is somehow "special" with your system. Maybe we can break it down and fix it. Maybe the problem originates from somewhere else (other extension, old speaking URL data in database or caches, etc.).

MichiBeck commented 7 years ago

Please try to delete all entries with PID=34. Clear caches afterwards and report back what happened after regeneration of the sitemap entry/entries.

I´m sorry - chaned nothing - it´s generated the same way and I tried a few times.

Did you upgrade from RealURL 1.x to 2.1? Eventually, you need to clean up your realurl tables (I did that a few times using SQL directly).

No, it was a fresh installation with the latest RealUrl Version from the beginning - And before going LIVE I cleared the realurl-tables using SQL to generate new urls.

Used Extensions:

fluid_styled_content
flux
fluidcontent
fluidpages
vhs
layerslider
news
powermail
powermailextended
powermailrecaptcha
realurl_404_multilingual
sf_event_mgt
skip_page_is_being_generated
t3adminer
and our own template-extension to generate fces.

I hope this helps a bit... I´m sorry but perhaps it helps for future versions...

thomaszbz commented 7 years ago

Regression test: What happens if you comment out these lines (the green ones): https://github.com/mblaschke/TYPO3-metaseo/commit/45b405799fe75da499fafb8b1276e91a399dd8b5

Please make sure to clear all caches (including the caches and the opcode cache in the install tool).

MichiBeck commented 7 years ago

Na, still the same problem. And it creates always EXACTLY the same wrong names like on the last screenshot - perhaps this is important...

Is it possible that I have a wrong RealUrl config?

ghost commented 7 years ago

I can confirm that such entries appear in the sitemap. Mostly it is 1, 2 or 3 characters. They seem to come from the URL, not just the page title. I tried several times to figure out where this comes from, but in the end I just blacklisted them.

thomaszbz commented 7 years ago

@bla-kw Thanks for confirming. We need to track this down to get an idea what's wrong here. Do you also use fluidpages?

@bla-kw or @MichiBeck Could you please try out the regression test (Regression test: What happens if you comment out these lines (the green ones): 45b4057 )?

MichiBeck commented 7 years ago

Made the regession test - still the same for me...

ghost commented 7 years ago

I don't have the extension fluidpages.

additional_reports authcode cal dce direct_mail flexslider formhandler formhandler_subscription gridelements metaseo my_redirects news newsslider ods_osm ods_osm_tt_address ods_plaintext piwik pm_rendercontent realurl smile_iframe sr_language_menu static_info_tables static_info_tables_de t3jquery tt_address

thomaszbz commented 7 years ago

@bla-kw No extension matches. Regression test is negative, now that you don't use fluidtables. If you can somehow track it down, please report back. I'll try to reproduce with default settings. Should not be the case that I can reproduce it just with that. Please report back if you somehow can track it down.

ghost commented 7 years ago

Not sure if this helps, but I just created a new page id=323 with title "kw-test", renamed it to "kw test2" then "kw / test3".

realurl module shows me:

en/kw-test/ L=1&id=323  –   2
kw-test/    id=323  15.04.2017 14:34    2
kw-test/    L=0&id=323  15.04.2017 14:34    2
kw-test/?baaa=2&cHash=0355628f4cabeab8f4fca59292b7958a  L=0&baaa=2&cHash=0355628f4cabeab8f4fca59292b7958a&id=323    15.04.2017 14:34    2
kw-test/?foo=bar&cHash=15e9f207c27699ef291049c6eb6857b9 L=0&cHash=15e9f207c27699ef291049c6eb6857b9&foo=bar&id=323   15.04.2017 14:33    2
kw-test/?no_cache=1 L=0&id=323&no_cache=1   15.04.2017 14:33    2
kw-test2/   id=323  15.04.2017 14:36    2
kw-test2/   L=0&id=323  15.04.2017 14:36    2
kw-test3/   id=323  –   2
kw-test3/   L=0&id=323  –   2
nl/kw-test/ L=2&id=323  –   2

In the sitemap module I see:

PID URL
323 3/
323 2/
323 8a
323 kw-test/
thomaszbz commented 7 years ago

@bla-kw Which versions are you using, especially which version of realurl?

ghost commented 7 years ago

Right now I have PHP 5.6.21 TYPO3 7.6.16 realurl 2.1.9 metaseo 2.0.3 But I have seen this behaviour already several months ago the first time, with lower versions of TYPO3 and realurl, too. Unfortunately I can not tell any more when I saw it the first time.

fdrewes commented 7 years ago

In my tests it happens when you configure absRefPrefix = / config.absRefPrefix = /

With a complete Domain like the baseURL everything is fine.

ghost commented 7 years ago

Interesting with the absRefPrefix. Unfortunately I can not test this on the production site. I will take a look on a dev system tomorrow. In my example today there was a bit of URL cache poisoning, which I could track down to a misconfiguration of sr_language_menu and fixed that. But that was not the cause of the 2/ and 3/ entries in the sitemap, they still appear. But the renamed entries kw-test2 and kw-test3 seem to be missing also in the sitemap.

thomaszbz commented 7 years ago

@bla-kw @fdrewes @MichiBeck could you please try this patch (taken from #354)?

https://github.com/mblaschke/TYPO3-metaseo/pull/353/commits/7699bffb1c90e7e67e48da80cd13812bf68d1f5c

If that solves it, then this issue is a duplicate of #354.

MichiBeck commented 7 years ago

@thomaszbz I patched the file, cleared all caches, logged out from backend but it still generates wrong urls

MichiBeck commented 7 years ago

After checking if I use config.absRefPrefix = / I saw that I also set config.baseURL = {$config.protocol}://{$config.baseUrl}/ disabling config.absRefPrefix solved the problem for me.

with disabled config.baseURL = {$config.protocol}://{$config.baseUrl}/ and active config.absRefPrefix = / it´s still broken.

Using config.absRefPrefix = {$config.protocol}://{$config.baseUrl}/ also works.

ghost commented 7 years ago

I found an old dev VM of the project with TYPO3 7.6.10, metaseo 2.0.0 and realurl 2.0.14. I updated everything to the same version as the production server, but unfortunately I am not able to reproduce this behaviour. :( At least absRefPrefix and baseURL are identical between production and dev configuration. But I remember I made some modifications for realurl config for news and some other small adjustments in production in the meantime. Seems to be something with my configuration that leads to this behaviour.

thomaszbz commented 7 years ago

@bla-kw could you please compare extensions between old and new? Maybe that narrows it to an extension or a version of an extension.

pdanzinger commented 7 years ago

I think I have managed to reproduce the bug with a fresh TYPO3 7.6.16 installation with realurl 2.1.9. I tested two metaseo versions, 2.0.3 and the current git develop commit #1b46634

It seems to occur on link indexing when:

Here is a simple TS root template for reproduction. It only requires realurl and metaseo, and the root page having id 1: example.txt

The problem seems to originate in https://github.com/mblaschke/TYPO3-metaseo/blob/develop/Classes/Hook/SitemapIndexHook.php in processLinkUrl().

Here's the function with some annotations explaining the error:

protected static function processLinkUrl($linkUrl)
{
    static $absRefPrefix = null;
    static $absRefPrefixLength = 0;
    $ret = $linkUrl;
    $tsfe = self::getTsfe();

    if ($absRefPrefix === null) {
        [...]
    }

    if ($absRefPrefix !== false && strpos($ret, $absRefPrefix) === 0) {
        $parsedUrl = parse_url($linkUrl);
        if ($parsedUrl !== false
            && $parsedUrl['path'] === $absRefPrefix
            && substr($absRefPrefix, -1) === '/'
        ) {
            // !!! when the URL matches the absRefPrefix, absRefPrefixLength is reduced
            // since absRefPrefixLength is static, it keeps shrinking whenever this code executes
            $absRefPrefixLength--;
        }
        // when $absRefPrefixLength is negative, substr returns the last characters of the URL
        $ret = substr($ret, $absRefPrefixLength);
    }
    return $ret;
}

Removing the static modifier from $absRefPrefixLength fixes the error for me.

MichiBeck commented 7 years ago

This works for me! Great!

Settings: config.baseURL = http://www.exmple.com/ config.absRefPrefix = /

thomaszbz commented 7 years ago

@pdanzinger Thanks for the patch and for tracking this down.

This issue is a regression, brought into existence via metaseo 2.0.2, #190, 236d6cd1580f1ed0a1f48129831a7e2302b64a6f. Blame me for not reading the static.

However: I think the static should remain. Instead, decreasing via $absRefPrefixLength-- should happen in a separate variable, so that it does not write back to the static variable.

Reason: $apsRefPrefixLength originally was written only once (if $absRefPrefix is null). If $apsRefPrefixLength is not static, then it will be empty on next call of the function. And if $absRefPrefix is not null (because it is static), then $apsRefPrefixLength will not regenerate.

Affected versions: MetaSEO 2.0.2 and 2.0.3.

thomaszbz commented 7 years ago

@MichiBeck, @pdanzinger, @bla-kw, @fdrewes Could you please test the patch 192a46e in #366?

I did not test it myself yet.

pdanzinger commented 7 years ago

@thomaszbz You are right, in my patch I overlooked that $absRefPrefixLength is only set when $absRefPrefix is null.

As expected, your patch worked fine on my test server.

ghost commented 7 years ago

I applied the patch this morning and deleted the blacklisted corrupted sitemap entries. Until now I haven't seen any new occurrences of this error. Thank you very much for tracking this down.

MichiBeck commented 7 years ago

Works also perfect for me! Thanks for this quick reaction!

thomaszbz commented 7 years ago

Related: #178