tomasnorre / crawler

Libraries and scripts for crawling the TYPO3 page tree. Used for re-caching, re-indexing, publishing applications etc.
GNU General Public License v3.0
55 stars 85 forks source link

BUG: Ensure only pages listed in pidsOnly is visited if used #1096

Open tomasnorre opened 1 month ago

tomasnorre commented 1 month ago

Bug Report

This is extracted from #816 to ease the implementation, and keep the issues size to a minimum.

Current Behavior When using the pidsOnly in PageTS Crawler configuration the "complete" pagetree to that page is calculated, and tried to visited.

Expected behavior/output I only expect pages in the pidsOnly list to be visited.

Steps to reproduce I use the following configuration (3 is the start page):

tx_crawler.crawlerCfg.paramSets {
    deployment = &L=[0-3]
    deployment {
        pidsOnly = 3
    }
}

With this configuration:

crawler:buildQueue 3 deployment --depth 99 --mode exec

So, it should only generate the start pages for four languages. But I get information from all other pages (Page-x are my placeholder for the real name):

Page-1:  (Because page is hidden)
Page-2:  (Because page is hidden)
Page-3:  (Because doktype is not allowed)

This is not very helpful as only one page in different languages should be generated.

Environment