tomasnorre / crawler

Libraries and scripts for crawling the TYPO3 page tree. Used for re-caching, re-indexing, publishing applications etc.
GNU General Public License v3.0
54 stars 85 forks source link

Bug and resulting Php Warnings with not translated mount points #1059

Open hannasophieungar opened 4 months ago

hannasophieungar commented 4 months ago

Bug Report

Current Behavior We get with every run of crawler task "buildQueue" a lot of errors releating to mound points. E.g. Core: Error handler (BE): PHP Warning: Trying to access array offset on value of type null in tomasnorre/crawler/Classes/Controller/CrawlerController.php line 725 Core: Error handler (BE): PHP Warning: Undefined array key 0 in tomasnorre/crawler/Classes/Controller/CrawlerController.php line 725 We get this errors for line 729 and 740 too. I think this error also concers line 741. I think mount pages are not indexed correctly with this bug in method getPageTreeAndUrls.

Expected behavior/output No undefined array key error and correct indexing of mount pages

Steps to reproduce Start task "buildQueue" in scheduler for root page which including not translated mount pages.

Environment

Possible Solution In method getPageTreeAndUrls for affected lines it needs to be checked if variable $mountpage has array key 0. I think the reason is that mount pages have no language overlays and with this no array key 0. In our case $mountpage are not translated and variable $mountpage is array with all necessary data like mount_pid etc.. Therefore $mountpage at key 0 not exists and resulting value is null. With e.g. $mountpage['mount_pid'] I think we will get expected result for getPageTreeAndUrls. This have to be fixed / checked for other lines too.

github-actions[bot] commented 4 months ago

Hi there, thank you for taking your time to create your first issue. Please give us a bit of time to review it.

tomasnorre commented 4 months ago

Thanks for your bug report, if you could provide a test that proves that the code is currently wrong that would be great, and even better if a fix could be provided too.

hannasophieungar commented 4 months ago

Thank you for your fast response. I'm sorry to write a unit test on method "getPageTreeAndUrls" for this special constellation with mocks for pages is difficult for me at the moment.

You could test this in backend if you "mount" any page (if the source page has translation doesn't matter in my tests). When you then start scheduler task with command "crawler:buildQueue: Create entries in the queue that can be processed at once" on the root page (see also attached configurations in screenshot) we get the errors above. In config.yaml we use fallbackType: strict

Here is a fix for method "getPageTreeAndUrls" for concerning code lines starting with comment "recognize mount points":

 // recognize mount points
            if ($data['row']['doktype'] === PageRepository::DOKTYPE_MOUNTPOINT) {
                $mountpage = $this->pageRepository->getPage($data['row']['uid']);

                // fetch mounted pages
                $this->MP = $mountpage[0]['mount_pid'] . '-' . $data['row']['uid'];

                $mountTree = GeneralUtility::makeInstance(PageTreeView::class);
                $mountTree->init(empty($perms_clause) ? '' : ('AND ' . $perms_clause));
                $mountTree->getTree($mountpage[0]['mount_pid'], $depth);

                foreach ($mountTree->tree as $mountData) {
                    $queueRows = array_merge($queueRows, $this->drawURLs_addRowsForPage(
                        $mountData['row'],
                        BackendUtility::getRecordTitle('pages', $mountData['row'], true),
                        (string) $data['HTML']
                    ));
                }

                // replace page when mount_pid_ol is enabled
                if ($mountpage[0]['mount_pid_ol']) {
                    $data['row']['uid'] = $mountpage[0]['mount_pid'];
                } else {
                    // if the mount_pid_ol is not set the MP must not be used for the mountpoint page
                    $this->MP = null;
                }
            }
Bildschirmfoto 2024-05-16 um 12 50 04

I hope you could use this information to reproduce our problem and maybe you could use this fix, if you think that makes sense.

tomasnorre commented 4 months ago

Thanks a lot, will look into it.