sypets / brofix

Check for broken links, forked from TYPO3 system extension linkvalidator
Other
6 stars 8 forks source link

Add exclusion checking by uid in CLI #114

Closed gaumondp closed 2 years ago

gaumondp commented 2 years ago

Is your feature request related to a problem? Please describe. You may have many pages (with subpages) on a website for historical reasons but the high number of pages older than 3 years make testing longer and those 404 will never be fixed on those external sites. So excluding pages and link crawling can speed up checking links a lot when you know those are not worth checking.

Describe the solution you'd like Using CLI, we want to exclude uid with implicitly subpages to be excluded from link checking.

Describe alternatives you've considered Add those "historical links" (hundreds of them) in the exclusion table (tx_brofix_exclude_link_target) by hand.

Additional context We plan to add this option to CLI:

-x=, --exclude-uid= Page id, separated by comma, that will not be checked. The function is recursive

atigiti commented 2 years ago

this feature is added in the "Add_exlusion_checking_by_uid_cli" PR

sypets commented 2 years ago

I am not sure if adding that in the command is the best idea. It might make more sense to make that more permanent, e.g. add a tx_brofix_do_not_check field to the page. I think that is the better solution because then this will also be considered when checking from the backend.

sbuerk commented 2 years ago

I am not sure if adding that in the command is the best idea. It might make more sense to make that more permanent, e.g. add a tx_brofix_do_not_check field to the page. I think that is the better solution because then this will also be considered when checking from the backend.

I would vote for a solution in this direction, as this avoids the need to put exclude conditions as value list / array for database queries, and can be more easly added with a single condition tx_brofix_do_not_check = 0. Additional it may eventually be reportable which pages are excluded from checks.

This beside your already mentioned point, that it would avoid some weired UI implementation for selection "exclude pages" selection on runs. I would also avoid some kind of array, siteConfig or TypoScript implementation, which will also boiling down to the need to add addional value lists which may exceed limitiations (query size, max_allowed_packages, ...) or further decrease handling if done after retrieving list of page uids.

Just my five cents on that.

sypets commented 2 years ago

I would vote for a solution in this direction, as this avoids the need to put exclude conditions as value list / array for database queries, and can be more easly added with a single condition tx_brofix_do_not_check = 0

I don't think we would need that - this could already be considered when traversing the page tree to fetch the pids (in the recursive function). But it does complicate things. Having an option per page would be easier to handle per DB query.

Another thing we already have the TSconfig which could also be used, e.g. add another option, such as enable=0

I do think it would be a good idea to exclude specific pages - just not sure yet what is the best solution. What I would like to avoid is having them excluded in one check (e.g. via CLI), but not excluded in another, e.g. via GUI because that would result in partial checks will usually results in broken link records not getting rechecked and stale information.

atigiti commented 2 years ago

i think adding the exclude option in the TSconfig of a page will be a good idea, then the uer will specify the excluded pages (e.g. page tree), and that will be more permanent to avoid the recheck of excluded pages. i will go with this solution.