webdevops / TYPO3-metaseo

TYPO3 MetaSEO Extension
https://typo3.org/extensions/repository/view/metaseo
GNU General Public License v3.0
38 stars 24 forks source link

Sitemap only contains few pages #55

Open papillon6 opened 9 years ago

papillon6 commented 9 years ago

I used the scheduler task to generate the sitemap, but actually the sitemap (or rather the sitemap group) only contains 10 pages out of ~150 pages that should be indexed. Pages are cached correctly (table cf_cache_table is filled), so i don't think it's a caching-issue.

root-page is set, domain-records are set, followed instructions in your manual... Did i miss something?

Version 1.0.8 (TYPO3 6.2)

By the way: With 'tq_seo' 6.0.1 it works like a charme...

mblaschke commented 9 years ago

Is the table definition up to date? Please check installer.

Are more pages available in Sitemap backend module? The scheduler task doesn't index any pages, they are indexed by frontend access. The scheduler task only generate static txt and xml files based on the pages in table tx_metaseo_sitemap.

papillon6 commented 9 years ago

Table definition is fine. Sitemap in backend-module is the same as shown in frontend (&type=841132).

Yes, i’m aware oft that. But your scheduler task only adds those pages to the sitemap, which where cached correctly. If i clear my fe-cache and access some sites via frontend the table cf_cache_pages is filled with each call – so caching works properly. Hm… strange. Right now i’m using an older version (tq_seo) which works great. Did you change something in the scheduler-task/command? I didn’t dig into your code yet.

Von: Markus Blaschke [mailto:notifications@github.com] Gesendet: Mittwoch, 6. Mai 2015 18:11 An: mblaschke/TYPO3-metaseo Cc: papillon6 Betreff: Re: [TYPO3-metaseo] Sitemap only contains few pages (#55)

Is the table definition up to date? Please check installer.

Are more pages available in Sitemap backend module? The scheduler task doesn't index any pages, they are indexed by frontend access. The scheduler task only generate static txt and xml files based on the pages in table tx_metaseo_sitemap.

— Reply to this email directly or view it on GitHub https://github.com/mblaschke/TYPO3-metaseo/issues/55#issuecomment-99525593 .Das Bild wurde vom Absender entfernt.

mblaschke commented 9 years ago

Are you using the MetaSEO Garbage Collection?

mblaschke commented 9 years ago

If you're using composer please try MetaSEO 2.0 (develop-branch or dev-develop in composer.json). I've fixed some issues with the sitemap indexing.

NiklasLazinbee commented 9 years ago

Hello, I have the same problem, only a few pages are indexed. It seems that the frontend access dosen't index sites or not all sites.

I use Sitemap garbage collection (metaseo) scheduler task.

I tested MetaSEO 2.0, it doesn't work with TYPO3 6.2 (Core: Error handler (FE): PHP Warning: require_once(/xxx/typo3conf/ext/metaseo/Classes/Hook/SitemapIndexHook.php): failed to open stream: No such file or directory in /xxx/typo3_src-6.2.9/typo3/sysext/core/Classes/Core/ClassLoader.php line 184. Same for HttpHook.php)

Is there any way to automate indexing? I have many sites.

Best regards, Niklas

mblaschke commented 9 years ago

What's inside /xxx/typo3conf/ext/metaseo/Classes/Hook/? These files should exists and they also exists in Git: https://github.com/mblaschke/TYPO3-metaseo/tree/develop/Classes/Hook

Did you use composer? Or just a git clone?

NiklasLazinbee commented 9 years ago

Oh I just copy the files to my server. I'm not firm in using composer but I try it out. I'm using this for composer.json

{ "repositories": [ { "type": "composer", "url": "http://composer.typo3.org/" }, { "type": "vcs", "url": "https://github.com/mblaschke/TYPO3-metaseo.git" }, ], "require": { "typo3/cms": "6.2.*", "mblaschke/metaseo": "dev-master" } }

but I only get version 1.0.8 of the extension?

NiklasLazinbee commented 9 years ago

Okay, this is the correct configuration for composer.json { "repositories": [ { "type": "composer", "url": "http://composer.typo3.org/" }, { "type": "vcs", "url": "https://github.com/mblaschke/TYPO3-metaseo.git" } ], "require": { "typo3/cms": "6.2.*", "mblaschke/metaseo": "dev-develop" } }

I think there is a conflict with some other extensions. I must copy the life-system and test deactivating extensions. This takes a little bit of time.

Apart from this issue, there is the question: Is ist possible to autmate indexing. If there are 500 pages it takes a lot of time to go to ervery page in frontend for indexing.

thomaszbz commented 9 years ago

@NiklasLazinbee Have you tried the orange "flush system caches" button. This button needs to be activated via install tool. Sometimes this helps when classes conflict.

I think you don't need to go to every page. MetaSEO indexes the current page and all the links on the current page. Basically you'd just have to click all menu entries, for example.

thomaszbz commented 9 years ago

@NiklasLazinbee @papillon6 : Is everyone happy?

As this issue became a "general purpose support" issue, I'd like to close it if there are not any problems left.

NiklasLazinbee commented 9 years ago

Sorry for late answer, the problem with indexing only few pages persists and I find no reason why. For the specific project I switched to another extension.

thomaszbz commented 9 years ago

@NiklasLazinbee I can't reproduce it by myself. As soon as I click around in the frontend the index and sitemap get filled (using develop branch).

I think your problem had to do with the missing "Classes/Hook/SitemapIndexHook.php". And I'm quite sure that this problem is solved or never existed because everybody would be affected by this. If it existed for some (few) users, It could still be caching issues, which I always could solve by clicking all the three cache-clear buttons in the backend.

If the bug had to do with namespaces: There has been some cleanup yesterday (in develop) which hopefully did not change anything in behaviour. However, even such a change could theoretically have some effect (but should not) when it comes to autoloading issues.

Would be happy if some active user can report us the problem in a reproducable way. Otherwise we'd possibly close it some day at the risk that a reported bug is never fixed.

sinasita commented 9 years ago

I've got the same problem. I'm using dev-version 2.0.

All of a sudden, I had some entries of news-articles in the table, no page at all. I don't know how they appeared and it's by far not all of them. Klicking different pages still doesn't change anything, the table remains pretty empty.

Then i made a test cleaned up the tx_metaseo_sitemap-table and reran the tasks, but now it remains completely empty and not even the articleentries from before appear.

Caching is activated and works. I tried to set the plugin.metaseo.sitemap.changeFrequency to always, but it didn't help either...I cleaned the Cache in the Install-tool multiple times - no effect...

Maybe i missed some important step. The Task just generates the file, that works, because it always has a new creation-date. Is there a special step needed to fill the table?

I'm using Typo3 6.2.14

mblaschke commented 9 years ago

Should we implement a debug mode with logging why a page cannot be indexed?

thomaszbz commented 9 years ago

@sinasita Would you please try to

We also should know more about your scenario:

If possible, we should have a minimal test case based on a fresh install of TYPO3 CMS. With a description which steps it exactly takes to reproduce this issue.

sinasita commented 9 years ago

Ok, i tried all the step you wrote. What i found out is that the articles, that are inserted into the table, are articles from the rss-feed, the feed.rss-page, the table gets filled with those articles... articles not appearing in the feed do not have any entry in the DB.

All the default pages are still missing in the table.

As description of the project: The page mostly consist of tx_news-articles on different "channels", there are only few pages with default pagecontent. The rootpage is parent of all pages, first-level with the default pages and sysfolders, one sysfolder for the service-pages like imprint and another sysfolder for the feeds and various sysfolders for the news-entries and sys_categories. No seperator, no mounts...

I use realurl and it works fine, installed extensions are powermail, devlog, fluidcontent, solr, flux, vhs, news, falsearch and socialshareprivacy. I don't use xdebug or APC, php-Version is 5.6.9.

When exactly is the moment the table gets filled? Which part of the code is responsible for that?

And another information that's probably important, sorry that i missed it before: the table tx_metaseo_sitemap was full with almost 200.000 entries, but also lots of old entries from non-existant pages and articles... so there was the wrong sitemap with old entries and it wasn't valid anymore in the goole search console. So because of that my intentsion was to refill the table...

Is there some timevar I missed that has to be unset (like a information, that the page is already in the table till a certain day, and only after that it's getting indexed again).

bildschirmfoto 2015-09-23 um 12 15 04

And a logging-information would be great...

thomaszbz commented 9 years ago

@sinasita Thank you for reporting this. Please stand by until we have a first test case to reproduce this - this can take some time. It is very likely that we will have further questions to finally track this down, especially if we can't reproduce this in the first place. I still would like to see your typoscript config for metaseo.

@sinasita The sitemap table should get filled as soon as you click on a page. This should trigger indexing of that particular page as well as the pages which are linked by the page you have clicked at. In respect to the scheduler tasks we are aware of some issues we must at least retest before we release metaseo 2.0.0

@mblaschke I guess we should try to reproduce this with the infos @sinasita has provided. Personally, I think we should try to reproduce and fix the problem instead of polluting the code with debug code with the risk of getting complex or even unusable debug logs from users (which then turn out to be caching issues etc.). I'd rather like to have more tests of the code basis, at least for the units of code we can run isolated tests against. Plus, should we use exceptions wherever possible. Exceptions potentially raise the likelyhood that users report them and that we can track down such problems much easier using stack traces. I already spent a lot of effort into a reasonable exception handling for the Ajax handlers. We really should extend this e.g. for indexing. At least to generate some log entries via exception handlers when things go wrong.

sinasita commented 9 years ago

Oh, i think I know the reason it wasn't working anymore...

What i changed was that i included a partial with render uncache:

It was included to all pages except the pages for the feeds, because of that, only these one worked. I didn't know this could lead to the problems, because the pages where still cached, except the part inserted with this snippet.

Thanks for your great will to help! So is it right than, that pages, only having a little part getting rendered again, won't be cached. Because in general, i think the caching of the page worked fine, the caching-tables got filled and a second call of the page was lot faster than the first one...

thomaszbz commented 7 years ago

This issue duplicates #271 . I'll leave it open until we find a fix for #271. That should fix this issue as well.