Open notacoder-ui opened 3 years ago
And also I got this in my typo3 error log:
Mon, 21 Dec 2020 05:56:00 +0000 [ERROR] request="e909aaa824fb3" component="INM.InmGooglesitemap.Generators.SitemapGenerator": Extension inm_googlesitemap: Error Code: 5 --- Reason: Socket-stream timed out (timeout set to 5 sec).
This error log which made site to show the 503 error and restarting the php-fpm service showed the site again.
Please check this too
Hi @notacoder-ui , there may be that other rules overlay the stuff from your robots. Please provide more information: TYPO3 version, PHP version etc. And of course the settings you made in the Scheduler task are important.
Okay.
Typo3 version : 9.5.19 PHP version: 7.2.34 Settings in schedular:
Okay, well adding mailto
to regexDirectoryExclude
will not help you... this will exclude something like https://foo.tld/mailto/something
.
But you may add news-letter
here instead of the mailto
to exclude this path.
What you also can shorten is linkExtractionTags
: Update this field that you only have href
there.
I know, mailto
is also in a href.
But let me know if it's better now. If not I would have to check why mailto
is not omitted by default.
Hi @merzilla
I updated the settings as you said and ran the cron job. Site went to 503 mode and I got to see this in the error log:
Tue, 22 Dec 2020 05:05:01 +0000 [ERROR] request="08db8edc7ac5b" component="INM.InmGooglesitemap.Generators.SitemapGenerator": Extension inm_googlesitemap: Response Header not correct. Got HTTP Status Code 302 for URL https://www.xyz.de/mailto:%20%69n%66%6f%40%72eise%6cinie%2e%64e --- Complete Response Header: HTTP/1.1 302 Found Date: Tue, 22 Dec 2020 05:05:01 GMT Server: Apache X-Powered-By: PHP/7.2.34 location: /404fehler X-Powered-By: PleskLin X-UA-Compatible: IE=edge X-Content-Type-Options: nosniff Cache-Control: public, no-transform, must-revalidate Last-modified: Mon, 14 Dec 2020 10:10:10 GMT Content-Length: 0 Connection: close Content-Type: text/html; charset=UTF-8
Hi @merzilla
I need to update settings like some links should not be indexed while generating a sitemap.xml file. Is there any possibility that I can set it to avoid such a URL?
Or obey robots.txt functionality is also fine for me so that I can set URLs there with disallow and that is not getting indexed while generating a new sitemap.xml
Hi,
I have set a rule in robots.txt that Disallow: /mailto:%20iasdf%66o%40r%65%69asdfdf%2ede Disallow: /news-letter/unsub
And started the cron job to index but the job always indexed the above both urls.
How to skip some urls not getting indexed in the sitemap.