tomasnorre / crawler

Libraries and scripts for crawling the TYPO3 page tree. Used for re-caching, re-indexing, publishing applications etc.
GNU General Public License v3.0
54 stars 81 forks source link

[FEATURE] categories in indexer are not handled #339

Open medarob opened 5 years ago

medarob commented 5 years ago

Hi,

I use crawler 6.2 and TYPO3 8.7.24.

I think I found an issue if you want to use categories. For example: Index only news with certain categories.

I have a multisite setup with news in one sysfolder (id=58). Depending on the website, different news are distributed. News with the 'categorie A' are shown only on website A, news with the categorie AB are shown on website A and B...

If I use this configuration all news are indexed, no matter what their categories are: &L=1&tx_news_pi1[news]=[_TABLE:tx_news_domain_model_news;_PID:58] -> 170 URLs submitted.

Hidden categories are not indexed - works: &L=1&tx_news_pi1[news]=[_TABLE:tx_news_domain_model_news;_PID:58;_WHERE: and hidden=0] -> 160 URLs submitted.

BUT If I want to add the category no news are indexed &L=1&tx_news_pi1[news]=[_TABLE:tx_news_domain_model_news;_PID:58;_WHERE: and hidden=0 and categories=102] -> 0 URLs submitted.

The problem is that the category id=102 is not in the table 'tx_news_domain_model_news' in the column 'categories'. In this column there is the total number of categories used for this news. For example If I use '2' instead of the '102' I get 55 results. But '2' is only the total number of categories for this news entry.

I think in order to make it work you have to join this table with another table, sys_category_record_mm.

But I'm not sure how to add a join with one or multiple tables to filter the categories? (if that would be the solution)

tomasnorre commented 5 years ago

Looking into the code, it doesn't look like this feature is support at this moment.

I'll be happy to convert your Bug to a feature request and see how it can be implemented. Will also happy to test a Pull Request if you provide one.

Currently the focus is more on the rewrite to support the TYPO3 v9, but if you can provide me with a solution I will be happy to check and review it.

tomasnorre commented 4 years ago

This is related to #516

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tomasnorre commented 3 years ago

/remove staled

tomasnorre commented 3 years ago

I moved the label of TYPO3v9 as it will not be implemented for the Crawler version that support TYPO3 9LTS

kpnielsen commented 2 years ago

For everyone else like me (who needs this almost exclusively for news), you can achieve this by adding a little bit of TS configuration and a signal within the News Detail action.

Picture this scenario: There is a single sys_folder containing news. Every news record has at least one category (let's call them A and B as in the original issue). On the page "Show Action for news of category A" only news of category A shall be shown. A similar statement holds for the page "Show Action for news of category B". In this scenario news of category B can still be show on page "Show Action for news of category A". It is just that there is usually no link on your page that leads to such a URL.

In order to prevent news from being shown on a page where they shouldn't be, add this TypoScript to the respective pages:

plugin.tx_news.settings {
    overrideFlexformSettingsIfEmpty = cropMaxCharacters,dateField,timeRestriction,archiveRestriction,orderBy,orderDirection,backPid,listPid,startingpoint,recursive,list.paginate.itemsPerPage,list.paginate.templatePath,categories,categoryConjunction
    categoryConjunction = AND
    categories = <ID of respective category>
    detail.errorHandling = pageNotFoundHandler
}

and register a signal slot:

// ext/ext_localconf.php
(function() {

    if (\TYPO3\CMS\Core\Utility\ExtensionManagementUtility::isLoaded('news') === true) {
        $signalSlotDispatcher = \TYPO3\CMS\Core\Utility\GeneralUtility::makeInstance(TYPO3\CMS\Extbase\SignalSlot\Dispatcher::class);
        $signalSlotDispatcher->connect(
            GeorgRinger\News\Controller\NewsController::class,
            'detailAction',
            Vendor\Ext\Slots\NewsDetailSlot::class,
            'detailActionSlot',
            true
        );
    }
})();
// ext/Classes/Slots/NewsDetailSlot.php
<?php

declare(strict_types=1);

namespace Vendor\Ext\Slots;

class NewsDetailSlot
{
    public function detailActionSlot($newsItem, $currentPage, $demand, $settings, $extendedVariables): array
    {
        if (!\is_null($newsItem)) {
            $demandedCategories = $demand->getCategories();
            $itemCategories = $newsItem->getCategories()->toArray();
            $itemCategoryIds = \array_map(function($category) {
                return (string)$category->getUid();
            }, $itemCategories);
            if (count($demandedCategories) > 0 && count($itemCategoryIds) > 0 && count(\array_intersect($demandedCategories, $itemCategoryIds)) === 0) {
                $newsItem = null;
            }
        }
        return [
            'newsItem' => $newsItem,
            'currentPage' => $currentPage,
            'demand' => $demand,
            'settings' => $settings,
            'extendedVariables' => $extendedVariables,
        ];
    }
}

Basically this does the following: In the TypoScript the news extension is told to add the respective categories to the Demand DTO. The signal slot then unsets the news item given to the Detail Action if it does not match the given categories, thus making the Detail Action - in conjunction with the TS settings for detail.errorHandling - trigger a 404 error. In particular, news will only be indexed on the pages where they can actually be found and do not trigger a 404.

tomasnorre commented 2 years ago

Thanks for adding the information.

Would you mind create a Section in the Documentation for the News Example? https://github.com/tomasnorre/crawler/blob/main/Documentation/Configuration/Examples/News/Index.rst