:bug: Blog post / page folders starting with an underscore ignored

webknjaz commented 3 days ago

I saw that #423 / #265 are talking about assets. I was trying to create a blog post called __init__ so that the URL would be https://<host>/blog/__init__ which was intended to be a pun for an intro post in the blog. I discovered, however, that Lektor failed to discover that folder and blog/__init__/contents.lr was ignored completely. At first, I thought that maybe it's because of the clash with dunder methods. So I dropped the trailing underscores. And then I deleted one at the beginning. Finally, I confirmed that a single leading underscore is what's causing this problem. Replacing it with a - makes lektor see it. Reverting it back to _ makes it delete the output.

I don't see a technical reason why it would have to be ignored. Perhaps, the restriction could be lifted or made configurable?

dairiki commented 3 days ago

I'm fairly sure that was intentional, but I can't say why...

The work-around (untested, but I'm fairly certain this will work) is to name your page source something that gets recognized (e.g. blog/init/contents.lr), and then set the _slug system field for the page to __init__.

webknjaz commented 3 days ago

Oh, interesting... I'll try it right now. I figured that there must be a reason, but I didn't find where it's implemented in code so couldn't verify. If this works, the issue can be turned from bug into docs, I suppose.

webknjaz commented 3 days ago

So when I tried setting _slug: __init__, it did nothing — the URL kept being /init/. Setting _slug: hello changes it to /hello/. Changing it to _slug: -hi turns it into /hi/, meaning that there's some conversion happening as well. Evidently, the workaround didn't work and revealed more surprising behaviors.

webknjaz commented 3 days ago

_id: __init__ didn't do anything either.

dairiki commented 3 days ago

So when I tried setting _slug: __init__, it did nothing — the URL kept being /init/. Setting _slug: hello changes it to /hello/. Changing it to _slug: -hi turns it into /hi/, meaning that there's some conversion happening as well. Evidently, the workaround didn't work and revealed more surprising behaviors.

Ugh. Apparently the value set for _slug gets run through slugify, which, as it stands now, strips any underscores.

Way back before #604, it appears that our custom slugify did preserve underscores.

I haven't thought this through very completely, but I think I'd be in favor of making slugify preserve underscores again.

dairiki commented 3 days ago

_id: __init__ didn't do anything either.

I don't think you want to mess with the _id. (I suspect if you set an _id in contents.lr it just gets ignored.)

dairiki commented 3 days ago

@webknjaz Here's another work-around for your case:

In the .project file set

[project]
included_assets = _*

or, maybe, be more specific:

included_assets = __init__

That, I think, (tested, this time) should cause Lektor to recognize your page in blog/__init__/contents.lr (and, if no explicit _slug is set, build it to /blog/__init__/index.html). Apparently, the _id, when used as a slug, is not processed through slugify.

dairiki commented 3 days ago

I haven't thought this through very completely, but I think I'd be in favor of making slugify preserve underscores again.

An alternative fix would be to change things so that an explicit value for _slug set in a contents.lr file is not run through slugify at all. (If the user is setting explicit slugs, trust that they know what they're doing.) When deserializing contents.lr, leading spaces and tabs are already stripped from any values. Perhaps, that's sufficient.

webknjaz commented 2 days ago

Are you saying that the slugify() call could be dropped @ https://github.com/lektor/lektor/blob/60a54627c472039923597f3dc552b2ab747b6b64/lektor/types/special.py#L24? Does anything on the JS side also need to match this?

webknjaz commented 2 days ago

Also, thanks for the included_assets tip. This one worked.

dairiki commented 2 days ago

Are you saying that the slugify() call could be dropped @

https://github.com/lektor/lektor/blob/60a54627c472039923597f3dc552b2ab747b6b64/lektor/types/special.py#L24

? Does anything on the JS side also need to match this?

I've just tested it. Dropping the slugify allows one to set a page's _slug to __init__.

There are a few buglets that arise:

It also allows one to set the _slug to %00, resulting in the page being rendered to /%00/index.html. That, as things stand, results in broken links it the admin UI (as well as perhaps other issues).
The javascript admin UI doesn't allow one to enter spaces in the _slug field. (It converts them to -.) There is probably no reason not to allow them — if one manually edits the contents.lr internal spaces in the _slug seem to work.

A related buglet that is present regardless of whether the slugify is dropped or not:

In the UI page edit view, hitting the "Save Page" button after changing the _slug does not update the eyeball ("Return to Website") link in the upper right corner of the page. (Reloading the page fixes things.)

There are probably other edge cases...

If we do drop the slugify, we probably still want to do enough sanitizing to ensure that the resulting slug is a valid directory name. (This is a trickier task on Windows than it is on Linux [^1].)

I'm still not quite sure whether this is the best option (compared to modifying slugify so that it doesn't strip underscores.) Though I do lean towards it, mostly because it seems least likely to break anyone's current site. The only possibility of breakage is if someone has set an explicit _slug and is relying on slugify modifying that value — people who are doing that get what they deserve.

[^1]: See, e.g., https://stackoverflow.com/questions/1976007/what-characters-are-forbidden-in-windows-and-linux-directory-names

lektor / lektor

:bug: Blog post / page folders starting with an underscore ignored #1198