processwire / processwire-issues

ProcessWire issue reports.
44 stars 2 forks source link

PagePaths module breaks being able to implement `path()` on a Page Class #1906

Open artfulrobot opened 3 months ago

artfulrobot commented 3 months ago

Short description of the issue

If a Page class implements the public function path() :string method to return a different url for the page, the PagePaths module will, in certain situations, cause the page not to be loadable by its URL.

Expected behavior

Be able to load a page by visiting $page->url

Actual behavior

Get 404

Specifics

The PagePaths module has two ways to generate the URL.

This causes a really confusing situation: edit + save a page (that implements path()) and its URL is likely reachable. Edit + save a parent page and all its children go 404.

A possible fix

ReplaceupdatePagePathsChildren() with

protected function updatePagePathsChildren($pageId, array $paths) {
  $numUpdated = 0;
  foreach ($this->pages->findMany("parent=$pageId,include=all") as $childPage) {
    $numUpdated += $this->updatePagePaths($childPage);
  }
  return $numUpdated;
}

Steps to reproduce the issue

  1. Create a template called Blah. Configure family so this page can be used as a child to some other tpls.
  2. Create a Page Class file for Blah and implement
    public function path() {
       return '/some-path-does-not-match-parent/' . $this->id;
    }
  3. Create a template file for the page, can just echo "hi".
  4. Create a child page of some existing page using this template.
  5. Save +View the page: it should show the /some-path-does-not-match-parent/123 path and work ok
  6. Make a change that would cause PagePaths to regenerate paths for child pages. Alternatively just do modules('PagePaths')->rebuild();
  7. Now try to visit the page at its URL: 404

Setup/Environment

Server Details

Software Version
ProcessWire 3.0.229
PHP 8.1.23
Webserver nginx/1.22.1
MySQL Server 10.11.6-MariaDB-0+deb12u1
MySQL Client mysqlnd 8.1.23
Server Settings |Parameter|Value| |------:|:-------| |allow_url_fopen|1| |max_execution_time|5 (changeable)| |max_input_nesting_level|64| |max_input_time|60| |max_input_vars|1000| |memory_limit|128M| |post_max_size|50M| |upload_max_filesize|20M| |xdebug|| |xdebug.max_nesting_level|256| |mod_rewrite|| |mod_security|| |EXIF Support|1| |FreeType|1|
GD Settings |Parameter|Value| |------:|:-------| |Version|2.3.3| |GIF|1| |JPG|1| |PNG|1| |WebP|1|
iMagick Settings |Parameter|Value| |------:|:-------| |Version|6.9.11| |GIF|1| |JPG|1| |PNG|1| |SVG|1| |PDF|1| |WebP|1|
Module Details |Module ClassName|Version| |------:|:-------| |AddImageUrls|0.3.0| |EditablePublishedDate|0.1.0| |FieldtypeDynamicOptions|0.1.7| |FieldtypePageTableNext|2.0.3| |FileValidatorSvgSanitizer|0.0.5| |InputfieldPageTableNext|2.0.3| |MarkupPnpNav|0.1.0| |MarkupSocialLinks|0.0.1| |PageListImageLabel|2.0.1| |PageTableNext|2.0.3| |ProcessMenuCreator|0.1.0| |ProcessPnp|0.0.1| |ProcessSwitchUser|0.0.6| |ProcessTagHerder|0.0.1| |ProcessWirePublishDate|1.2.0| |ProcessWireUpgrade|0.1.1| |ProcessWireUpgradeCheck|0.0.9| |Sifter|0.1.0| |TextformatterInlay|1.0.0| |TextformatterTweetEmbed|0.1.0| |TextformatterVideoEmbed|2.0.2| |TextformatterVideoEmbedOptions|0.4.0| |TracyDebugger|4.25.11|
ryancramerdesign commented 2 months ago

@artfulrobot I think what's missing here is a handler for your custom URLs. You can override the path method (and you should override __path() not path()) but that is just overriding the URL at runtime, so ProcessWire isn't going to know that /hello/world/ should load /some/page/ unless you have something to detect /hello/world/ and then tell ProcessWire to serve /some/page/. So technically /hello/world/ shouldn't work at all in this situation. My best guess is that maybe it got cached in PagePaths and appeared to work temporarily. But PagePaths is essentially a cache, and not the source of URLs. So in addition to your custom ___path() method, you would need this in your /site/init.php file:

$wire->addHook('/hello/world/', function($event) {
  return $event->pages->get('/some/page/'); 
}); 

That's a really simple example. See the dedicated URL/path hooks post for more details.

An alternative to URL/path hooks is to enable URL segments on your 'home' template. Then you can capture those URL segments like this by editing your /site/templates/home.php template and adding a handler at the top:

if($input->urlSegmentStr === 'hello/world') {
  // url segment string that we recognize 
  $page = $pages->get('/some/page/');
  echo $page->render();
  return $this->halt();

} else if($input->urlSegmentStr) {
  // some other URL segments that we do not recognize
  wire404();
}

// serve regular homepage content here
artfulrobot commented 2 months ago

@ryancramerdesign thanks for your detailed response.

I might have missed something - I've been round the houses on this one, it's been a long journey

The problem with the URL hook (which is what I tried first), is that the URL picker doesn't (can't) understand it. So the wrong (i.e. internal) links get generated while the URLs we want cannot be found. Now, the internal generated links could work with a 302 redirect, but that generates loads of 302s, which is not great for anyone.

The problem with url segments is that the link QA doesn't work - it sees that the tpl handles segments and it gives up checking, assuming it's fine.

It seems there's some places where pagepaths is used and some where it's not.

I've found the above fix to PagePaths to work ok: the cache is populated with the correct (desired) URLs; the pages give the desired URLs from $page->url; the URLs work to load the page; the URLs can be searched; the link QA works.

I'll try swapping paths() for ___paths() though.

ryancramerdesign commented 2 months ago

@artfulrobot What do you mean by the URL picker that doesn't work? I'm not sure I know specifically what you are talking about.

This ProcessWire powered site uses custom URLs like described above: https://www.biketours.com ... pretty much none of the URLs on the site match their page tree in the admin. This was so that the URLs could be maintained from the former site, which wasn't running ProcessWire. But we wanted better organization in the admin side, even if not reflected in the URLs. I just checked and looks like we are using MarkupQA on some fields and not others. Though we have PagePathHistory installed so I'm not sure it matters.

Regarding the updatePagePathsChildren() method, my mind isn't in that code enough to grasp it at the moment, but if I recall correctly, this was a case where it needed something that went straight to the database to be practical with the quantities it could potentially deal with. There are some installations out there with hundreds of thousands or millions of pages, where even findMany() might not be practical. But your version sure is nicer to look at. Maybe I could have it count ahead of time and decide what to use. Or could make the method hookable, so that you could easily replace the logic of the method with your own, and have it continue to work through version upgrades.

With PagePaths not intended to be the authoritative source of URLs, I'm not sure exactly how it's all working in your case, or how fragile it might be across different situations. But if it all seems to work without issue, then that's good to hear, perhaps it's a safer solution than I originally guessed.

artfulrobot commented 2 months ago

What do you mean by the URL picker ...

I am referring to this widget that lets you search pages by URL

image

With my set up:

All this is good!

Implementing a URL/path hook would mean that it would need another source of desired-urls-to-pages lookup; Perhaps on a hidden field on all pages, or in a special db table. And I don't think URL hooks are consulted on tools like the MarkupQA checker (e.g. you code calls halt() - wouldn't want that while checking a URL!): so it (a) won't offer the desired URLs and (b) will complain that the URL doesn't exist when it does.

But I hear your concern (I'm hearing "you're making it work a different way to how it was designed to work!").

Re your bikers website, it's hard to tell from the front end what's supposed to be what. e.g. I note that when on
https://www.biketours.com/how-we-help/our-team/
I might expect the liv-gray page to be:
https://www.biketours.com/how-we-help/our-team/liv-gray/
but it's actually at
https://www.biketours.com/info/how-we-help/our-team/liv-gray/

I realise this may be deliberate.