picocms / Pico

Pico is a stupidly simple, blazing fast, flat file CMS.
http://picocms.org/
MIT License
3.81k stars 616 forks source link

Redirecting pages by alternate means to YAML frontmatter. #618

Closed notakoder closed 2 years ago

notakoder commented 2 years ago

I am currently using PicoRedirect plugin in order to redirect pages. This requires adding the Redirect: header in the YAML frontmatter of each page. However, in situations where you need to redirect every page in a directory, this is indeed a task. Do we have an alternate way to redirect all pages in a particular directory to another path?

PhrozenByte commented 2 years ago

You'll have to write a custom plugin to hook into Pico's onRequestUrl event to implement something like. As a likely simpler solution you could also utilize your webserver: If you're running Apache, use mod_rewrite.

notakoder commented 2 years ago

If I am to use the Rewrite rule, what is the url format I should use? https://example.com/content/page-name.md or https://example.com/page-name?

The official documentation uses the absolute url with the extension, which I believe does not exist in Pico.

mayamcdougall commented 2 years ago

It's been awhile since I've done any web server configuration myself, so I don't know what the proper config would look like off the top of my head.

But what you're going to be looking to do is use a RegExp-style string to match and replace a particular part of the URL.

So, for example, it would be like, rewrite all example.com/blog/* to example.com/news/*.

(The asterisk there is just to illustrate the point. That's not necessarily how it will be written.)

You're rewriting the URL from the perspective of the end user though, so it would be example.com/page-name in that case. You're not telling Pico to look for a different content file, you're just redirecting the user to a new URL before they get to Pico.

I can do a little digging for you if you'd like. Just let me know what web server you're using and give me an example of what URLs you're trying to rewrite to what. πŸ˜‰

notakoder commented 2 years ago

You're rewriting the URL from the perspective of the end user though, so it would be example.com/page-name in that case. You're not telling Pico to look for a different content file, you're just redirecting the user to a new URL before they get to Pico.

This is what I wanted to confirm. Thank you.

So, here's what I have done so far. In my local machine, I edited my /etc/apache2/apache2.conf file and added

RewriteEngine On
RewriteRule "^project_folder/blog/(.*)" "project_folder/new/$1"  [R]

inside

<Directory /var/www/>
Options Indexes FollowSymLinks
AllowOverride All
Require all granted
RewriteEngine On
RewriteRule "^project_folder/blog/(.*)" "project_folder/new/$1"  [R]
</Directory>

Restarted Apache2 service but the redirection didn't work. I tried changing the url to a full web url as RewriteRule "^http://localhost/project_folder/blog/(.*)" "http://localhost/project_folder/new/$1" but it didn't work either.

What am I doing wrong?

mayamcdougall commented 2 years ago

@PhrozenByte, do you have any input on this? I don't have an Apache test environment set up at the moment, and I'm not that familiar with Apache configuration these days anyway.

As far as I can tell, the rewrite rule looks good. An online tester I found even confirms that it should result in the desired rewrite, so I'd have to assume the issue is something else configuration related. πŸ€”

One possibility I could see is that Pico's .htaccess rules are flagged [L] or "Last", so that once they're applied, no further rewrites occur. As far as I can tell, Apache applies Rewrite Rules starting from the deepest directory and working its way up. This might mean that the rules in .htaccess are making it so that the new rule in apache2.conf just isn't being used.

@notakoder, this is purely speculation on my part, but it might be worth a quick test. πŸ€·πŸ»β€β™€οΈ

You could try either removing the [L]'s, or putting your own rule above it in Pico's .htaccess to see if that makes any difference.

notakoder commented 2 years ago

@mayamcdougall My .htaccess does not have any redirection set in it because I am told that Pico does not have real page links but only vitrual page links. example.com/folder/page really is example.com/content/folder/page. Moreover, I think that only the rules are overwritten by the 'highest priority' config file, not the file itself. So, everything inside the apache.conf file that is not overwritten by .htaccess must be honoured. Again, this is just my common sense talking. I could be wrong.

Anyway, here is my .htaccess file.

<IfModule mod_rewrite.c>
    RewriteEngine On
    # May be required to access sub directories
    #RewriteBase /

    # Deny access to internal dirs and files by passing the URL to Pico
    RewriteRule ^(config|content|vendor|CHANGELOG\.md|composer\.(json|lock|phar))(/|$) index.php [L]
    RewriteRule (^\.|/\.)(?!well-known(/|$)) index.php [L]

    # Enable URL rewriting
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^ index.php [L]

    <IfModule mod_env.c>
        # Let Pico know about available URL rewriting
        SetEnv PICO_URL_REWRITING 1
    </IfModule>
</IfModule>

# Prevent file browsing
Options -Indexes -MultiViews

# Redirect to non-www
RewriteCond %{HTTP_HOST} ^www\.(.+\.[^\.]+)$ [NC]
RewriteRule ^ https://%1%{REQUEST_URI} [R=301,L]
mayamcdougall commented 2 years ago

Alright, so, again, I have to preface this by saying I'm not that knowledgeable about Apache config. I haven't tested anything, I'm just going off of what I've read.

What I'm saying is not that .htaccess is overriding your apache2.conf, simply that its RewriteRules are being applied first. Under normal circumstances, it would then go through and apply any rewrites in your apache2.conf, however, the [L] sounds like it tells Apache, "If we match and rewrite based on this rule, don't read any further RewriteRules for this url, we're done."

Since Pico's included RewriteRules try to rewrite just about everything to either 404 or index.php, I'm wondering if that's what's interfering. It's already been matched once, and the [L] told it not to try anymore.

For testing, does it make a difference adding your own rule to .htaccess just above Pico's?

RewriteRule "^project_folder/blog/(.*)" "project_folder/new/$1"  [R]

# Deny access to internal dirs and files by passing the URL to Pico
RewriteRule ^(config|content|vendor|CHANGELOG\.md|composer\.(json|lock|phar))(/|$) index.php [L]
RewriteRule (^\.|/\.)(?!well-known(/|$)) index.php [L]
notakoder commented 2 years ago

...simply that its RewriteRules are being applied first. Under normal circumstances, it would then go through and apply any rewrites in your apache2.conf,

Oh, that's what you meant. I thought otherwise that Rewrite rules in apache2.conf is applied first and then rules in .htaccess are parsed. Anyway, I tested what you suggested, but there is no redirection either. I even tried with the proper page name instead of regex.

RewriteRule "^project_folder/blog/page" "project_folder/new/page"  [R]
RewriteRule "^project_folder/blog/(.*)" "project_folder/new/$1"  [R]
RewriteRule "^http://localhost/project_folder/blog/(.*)" "http://localhost/project_folder/new/$1"  [R]

So, I guess, I'll wait for any responses to this issue. Thanks for helping. :smile:

mayamcdougall commented 2 years ago

That's too bad. I've got no idea then. πŸ€·πŸ»β€β™€οΈ

I was really hoping that would fix it.

Short of spinning up my own Apache install and seeing if I can recreate the situation (and stumble upon a solution), I don't have any other answers on this. πŸ˜“

@PhrozenByte If you've got any input on this one it'd be appreciated, thanks. πŸ˜πŸ˜…

(I'm sure it's probably something stupid we both missed. 😜)

notakoder commented 2 years ago

That's alright. You've tried. Been helpful. Let's see if PhrozenByte has any suggestions.

PhrozenByte commented 2 years ago

Unfortunately I can't provide webserver support, just some quick notes: The first argument to RewriteRule is a regex matching a relative (i.e. it actually matters whether you add the RewriteRule to apache2.conf, or to a .htaccess file) path, no URL. Rules are evaluated in order, i.e. if you want to create a redirection rule (option [R]), you usually want it to be evaluated first and stop evaluation afterwards (i.e. it should be [R,L]). Remember to restart Apache if you're modifying apache2.conf and don't forget to actually enable both the mod_rewrite module and then the rewrite engine (i.e. RewriteEngine on).

mayamcdougall commented 2 years ago

The first argument to RewriteRule is a regex matching a relative

Okay, I wasn't even thinking about that.

So, assuming project_folder is your Pico folder, I probably should have actually had you test something like:

RewriteRule "^blog/(.*)" "new/$1"  [R]

# Deny access to internal dirs and files by passing the URL to Pico
RewriteRule ^(config|content|vendor|CHANGELOG\.md|composer\.(json|lock|phar))(/|$) index.php [L]
RewriteRule (^\.|/\.)(?!well-known(/|$)) index.php [L]

@PhrozenByte This is of course presuming I was actually even on the right track. Do you think that Apache matching an [L] rule in .htaccess would prevent it from matching @notakoder's new rule in apache2.conf altogether? I have no idea if I'm interpreting Apache's behavior right, I'm just trying to draw conclusions from random guides and documentation. πŸ˜“

Also, presumably we wouldn't want to use both [R,L] for this one, because in the end it still needs to be rewritten to index.php in the end?

PhrozenByte commented 2 years ago

apache2.conf is interpreted before any .htaccess file. R yields a HTTP redirect, i.e. the browser is redirected to said page and sends another request then. So no, if you do use R, you usually also want to use L, otherwise any later RewriteRule might overrule the redirection again (i.e. no redirection happens).

mayamcdougall commented 2 years ago

apache2.conf is interpreted before any .htaccess file.

Yeah, that's what I would have though. All I could find was that RewriteRule's are applied from the "bottom up", starting from the deepest directory. Nothing on whether they actually got applied before apache2.conf or not.

R yields a HTTP redirect, i.e. the browser is redirected to said page and sends another request then.

Right, and then the matching process starts over, essentially, with a new request. πŸ€¦πŸ»β€β™€οΈ

I had realized that somewhere before now, but then I forgot somewhere in the troubleshooting. πŸ˜“

I've had this quiet thought in the back of my head for the whole thread, trying to point out the obvious difference between a rewrite and a redirect, but I couldn't quite remember what it was. πŸ˜…

Alright, thanks for the refresher, I'll keep at it with @notakoder and see if we can get if solved for them.


@notakoder It seems like .htaccess shouldn't be causing any issue, so go ahead and revert that to its original form.

Going back to apache2.conf, let's try adding both flags, [R,L], just in case. Double check that you've got the relative path correct in your RegEx (it seems like you do based on the other URL examples you provided, but still).

Make sure you've got mod_rewrite enabled as well. Do your Pico page URLs rewrite correctly or do they have question marks in them (eg http://localhost/project_folder/?blog/post)?

Maybe there's some clues in your apache logs? You can usually "follow" them with tail -f file.log on the relevant log file (afaik, Apache logs are configured differently on different distros, webhosts, etc., so the actual file and whether it's logging anything that'll help here would vary).

Edit: I've got apache running in Docker atm, so let's see if I can't figure out what it takes to make this rewrite work properly... P=

Edit 2: I think I've got it, hang tight. πŸ˜‰

mayamcdougall commented 2 years ago

Okay, so I'm 99% sure this is going to be the answer.

Now that I remember how Apache works... *drumroll*

Remove all your changes from apache2.conf and don't touch it again. πŸ€¦πŸ»β€β™€οΈ

Go into /etc/apache2/sites-available and find your relevant site config. If you haven't changed this, it's probably called something like 000-default.conf.

Put your RewriteRule there. πŸ€¦πŸ»β€β™€οΈπŸ€¦πŸ»β€β™€οΈ

Breathe a sign of relief, then cry in the corner with me for forgetting how Apache is usually configured. 😭

(I'm probably being too dramatic, which is going to jinx this. πŸ‘†πŸ»)

But I'm hopeful this time it's really going to be the fix. It took me FAR too long messing with apache2.conf before I realized... I don't think I've ever had to touch it before!

All my sites were always configured using the sites-available and sites-enabled (symlinks of sites-available) config folders. Even in Nginx, oddly enough, this paradigm is often used for site-level config.

So... yeah.

The Rewrites belong in your VirtualHost directive, NOT in the Directory directive. 😀

(Please let this be it. 🀞🏻😩)

notakoder commented 2 years ago

(Please let this be it. 🀞🏻weary)

I think that is it. :smile: Moving the lines to 000-default-conf worked in my system. However, I had trouble getting the correct path in the server since the document roots was /var/www/html instead of /var/www as in my system. Anyway, I figured out that I could do the same in my .htaccess file instead of Apache configuration and this had the advantage of portability of redirects inside the source code itself.

So, this is what I tried in my .htaccess file

RewriteRule "^old-folder/(.*)" "http://localhost/project-folder/new-folder/$1" [R=301,L]

and this works. But I am using a URL as the destination and this creates issues in the server since the folder names are different. I've been trying to use a relative path as below,

RewriteRule "^old-folder/(.*)" "new-folder/$1" [R=301,L]

This just doesn't work. The urls are redirected to http://localhost/var/www/project-folder/new/page-name. I don't understand how /var/www/ is appended within the link.

According to Apache documentation, a 'web-path' to a resource should work and follows the same syntax I used. I am assuming that the document root for .htaccess is indeed the project folder itself since the first pattern mentioned in my syntax ^old-folder is being evaluated correctly. I do not think that the pattern and substitution will parse two different document roots.

mayamcdougall commented 2 years ago

Sorry, meant to respond yesterday, but I couldn't really word what I was trying to say about it. Time to try and salvage this reply... (It's not working... seems like I'm just dumping words on the page...)


You should probably just move your local project to /var/www/html to match the server, as this usually seems to be the standard.

I ran into this issue, odd redirect behavior and all, which was part of my confusion as well. It happened when I tried using /var/www instead of /var/www/html, not realizing that there was a sites-available/000-default.conf file setting /var/www/html as the DocumentRoot. It seemed like the mismatch was the cause. The /var/www/ directory set in apache2.conf was NOT the DocumentRoot.

With the files in /var/www/html and the RewriteRule in 000-default.conf, everything worked as expected.

Although apache2.conf uses /var/www in its default Directory directive, most distros override this with a VirtualHost directive for security. I found a lot of 2014-era posts explaining why Debian was making this change, and specifically going with /var/www/html to match other distros like Fedora.

You shouldn't need to do anything with the full URL.

The RewriteRule I used was RewriteRule "^/project_folder/blog/(.*)" "/project_folder/new/$1" [R,L]


I wouldn't use .htaccess for this unless you have to. So what's the actual problem here? No matter the DocumentRoot, it shouldn't affect your RewriteRule if the files are in the proper place. It sounds like you're having the same issue as me, thinking that /var/www was the DocumentRoot just because it was the Directory in apache2.conf, when really DocumentRoot was set to /var/www/html in the VirtualHost section of 000-default.conf.

I'm not really sure how to help you with it any more at the moment. πŸ˜•πŸ˜”

Going down the .htaccess and full URL route sounds like the wrong direction. Does your server also use the sites-available config file paradigm? If so, you should use them to configure each site, and not apache2.conf or .htaccess.

mayamcdougall commented 2 years ago

Try with a leading slash on your .htaccess rule's replacement string.

RewriteRule "^old-folder/(.*)" "/new-folder/$1" [R=301,L]

Stumbled upon this when I was testing stuff. Seems to help. πŸ€”

notakoder commented 2 years ago

The DocumentRoot inside sites-available/000-default.conf is indeed /var/www/; not /var/www/html. What I meant was, for the RrewriteRule inside .htaccess, the DocumentRoot is /var/www/html/project-folder, which is why, the pattern is matched. So DocumentRoot does not seem to be the problem. So here are my tries:


Inside .htaccess, the RewriteRule "^old/(.*)" "new/$1" [R=301,L] rule gives me:

http://localhost/var/www/project-folder/new/page instead of http://localhost/project-folder/new/page in my system, and

https://example.com/var/www/project-folder/new/page instead of https://example.com/new/page in the production server.

Not sure why the string /var/www/project-folder is inserted in the link. I tried adding / to the pattern, like "^/old/(.*)", but it doesn't match the url at all.

Try with a leading slash on your .htaccess rule's replacement string.

Tried it on my system. RewriteRule "^old/(.*)" "/new/$1" [R=301,L] and the browser redirects to http://localhost/new/page. I think we are missing out something tiny here regarding the substitution part. The pattern "^old/(.*)" is fine. It is the substitution causing the trouble.


In my local /sites-available/000-default.conf,

DocumentRoot /var/www/
RewriteEngine On
RewriteRule "^/project-folder/old/(.*)" "/project-folder/new/$1" [R=301,L]

This works perfectly. But the same code in my production /sites-available/000-default.conf,

DocumentRoot /var/www/
RewriteEngine On
RewriteRule "^/project-folder/old/(.*)" "/project-folder/new/$1" [R=301,L]

couldn't restart Apache server as it exited with an error code. Perhaps because I have configured virtual hosts for each site. So I went ahead and edited the virtual host configuration /etc/apache2/sites-available/project-folder.conf as,

DocumentRoot /var/www/project-folder

RewriteEngine On
RewriteRule "^/old/(.*)" "new/$1" [R=301,L]

And restarted Apache service but the pattern itself is not recognised. I tried without the leading / too in the pattern: "^old/(.*)", only to see the same result.


Now, I can use the full web address as the substitute although it is slightly inconvenient. This is not a one solution for both localhost and production, but that is the only alternative I have at this point.

Anyway, we've put quite a good amount of time into this thread and I think we should close this issue. Thank you so much for your help @mayamcdougall β€”as always, doing everything you can to help me. :smile: Thanks @PhrozenByte too.

mayamcdougall commented 2 years ago

Tried it on my system. RewriteRule "^old/(.*)" "/new/$1" [R=301,L] and the browser redirects to http://localhost/new/page.

Oh, wait. I guess I forgot to restore the project_folder subfolder for that attempt. Sorry, been bouncing around between a few different configurations. The / worked simply because it was replacing from the DocumentRoot in my case. I'd forgotten it was supposed to be in a subfolder. πŸ€¦πŸ»β€β™€οΈ

I have no idea why the substitution is inserting the system path at the start.

The DocumentRoot inside sites-available/000-default.conf is indeed /var/www/; not /var/www/html.

In my previous comment I tried to explain (poorly) how I had run into that issue before when adding the rule to apache2.conf instead of 000-default.conf. But yeah, I saw the same behavior putting it in .htaccess too, just like you did.

Googling the specific problem itself (eg apache rewrite inserts system path /var/www) brought me to this StackOverflow Post, which might help a little.

Honestly, it kind of sounds like this is just something that happens when the path you're redirecting too isn't the DocumentRoot. In fact, it even sounds like using the whole URL is part of the recommended solution.

Though, they also mention that using RedirectMatch instead of RewriteRule might work better, but I haven't tried that at all yet.

Might be worth a try.

Thank you so much for your help

No worries. Happy to help where I can. πŸ˜‰

Mostly just bummed that I haven't had better answers on this one. πŸ˜“

I've been running Pico in the same Nginx Docker container for years now, so I don't really touch configuration that much. Heck, I've been meaning to switch those sites over to Caddy for awhile now, because I've had a lot of luck using it as a Reverse Proxy on my server. But I've just been far too busy/lazy to deep dive into that. πŸ˜‚

notakoder commented 2 years ago

I've been meaning to switch those sites over to Caddy for awhile now

Will check it out too.

Mostly just bummed that I haven't had better answers on this one. πŸ˜“

That's alright. :) If I get an answer to this issue, I'll update here too.

mayamcdougall commented 2 years ago

Will check it out too.

I mean, I'd definitely recommend trying Caddy out. It does some amazing things, like HTTPS by default (with no effort on your part!).

However, just know that I can't really offer much for support on it. I haven't looked into writing any Pico config for it yet. At some point I plan to add a Caddy page to our Docs.

I'm pretty much only using it to Reverse Proxy my Docker containers right now. It's kind of like the traffic director that points everything to the correct container.

All my configuration has been pretty simple so far, like

plex.example.com {
    reverse_proxy plex:32400
}

plex:32400 in this example is the hostname:port of my Plex container, allowing me to reach it at a plex subdomain of my home server. I've taken to putting most of my web apps on their own subdomain for simplicity. πŸ˜‰

I haven't gone any deeper than I've needed to so far. But, that's the point with Caddy: It's designed to be easy to configure. So far I've used the reverse_proxy, basicauth, and php_fastcgi directives and that's about it.

I do have my testing instance Pico running in it (using php_fastcgi php-fpm:9000 to utilize to a separate php container)... I just don't have any of the basic Pico security RegEx stuff implemented on it (blocking access to config, content, etc). It's not an issue, since it's just my testing instance, but I'd definitely want to figure out the rest of it before I use it in production. πŸ˜…

Still, I'd highly recommend Caddy as long as you're comfortable learning (which, you seem to be, despite our fumbling around an Apache solution here πŸ˜‚).

notakoder commented 2 years ago

Good to see that you've used it and recommend it. I shall try it :relaxed: