picocms / Pico

Pico is a stupidly simple, blazing fast, flat file CMS.
http://picocms.org/
MIT License
3.82k stars 616 forks source link

The pages array is always built for the whole site #490

Closed ohnonot closed 5 years ago

ohnonot commented 5 years ago

...and with a blog-type site with many posts, that can impact performance quite a lot.

I just tested with a busload of fake content and an index.twig that doesn't even iterate over the array - it takes a few seconds just to load an (almost) empty page, so internally pico does look at every file in the content directory before even showing the client anything.

But it isn't only a performance issue, it also affects the use of plugins that themselves rely on the pages array.
Real life example: pagination - I can filter out the unneeded pages from the array, but by that time the pagination plugin has already counted the pages, and I might get less results per page.
Similar problems with a search plugin.

I was hoping that a subdirectory's index would at least discard the parent directories' pages, but it doesn't. Or that I could exclude a folder from my contents by making it hidden (e.g. _drafts) - no luck either!

So, is there a way? This issue suggests that people are aware.

Greetings & Thanks for another great opensource project!

PhrozenByte commented 5 years ago

Pico works this way for historic reasons; it was originally designed to provide rather small websites, not blogs with hundreds or even thousands of articles (there's no definite limit, there are Pico installations with 100 pages and more running just fine; it all depends on your hardware and the load your website receives).

Anyway, you're completely right. Thus we're planning to overhaul the page discovery process and implement a directory-tree-based page discovery with Pico 3.0. We did some groundwork for this with Pico 2.0 by implementing Pico's page tree and allowing pages and directories to be hidden. But we didn't change anything about how Pico 2.0 discovers pages; this is planned for Pico 3.0. Anyway, if you need to iterate all pages (e.g. when a pagination plugin tries to determine the number of pages), Pico still would have to discover all pages. This is a inherent consequence of Pico being a flat-file CMS.

ohnonot commented 5 years ago

I see now that a "hidden" file or directory still shows up in the pages array, and can be queried e.g. for a page summary, but it cannot be accessed directly.

How much exactly is collected everytime the page array is built? Is it just a file listing, or does it already fill the array with all available data, metadata, parsed content...? (i'm not sure if these are purely visions for the future, or if you compare things to the current situation)

Anyhow, thanks for the clarifications so far. At least I know where I'm at now.
I hope things improve as my blog grows.

PhrozenByte commented 5 years ago

Pico reads and parses the YAML Front Matter for all content files on startup. The contents of a page are parsed on demand (only exception: the requested page, the contents of the requested page are parsed on startup).

ohnonot commented 5 years ago

Thanks for the clarification.

Related question:

I built a search page that paginates itself by means of the url_param function. As I understand it, for each page, the search is executed again, and only the appropriate part is shown.

Is it possible to avoid that?

I saw some examples that made me think that this should be possible, but I'm hazy on the whole concept of POST/GET.

PhrozenByte commented 5 years ago

Unfortunately I'm not really sure what the actual question is; what do you mean by "for each page, the search is executed again" and "only the appropriate part is shown"?

ohnonot commented 5 years ago

have you seen the search.twig I linked?

so i enter /search?q=gnu (just an example)
and the twig template searches the pages array for the string "gnu", and puts pages that match in a results array.

but i also implemented pagination, so the twig template shows only the first, say, 10 results, and shows a link at the bottom /search?q=gnu&p=2
now the way i see it, clicking on that calls the search page and its template again, and the whole search is being executed again, only this time it shows results 10 - 20.
Is it possible to avoid that?

PhrozenByte commented 5 years ago

Ah, I see. Yeah, sure, every request is handled independently, i.e. the search is executed again. That's the reason why projects like Lucene exist, implementing a search in a feature-rich but still performant manner isn't trivial. This isn't possible with pure Twig, you'll have to write a plugin. Instead of re-inventing the wheel, it's probably best do utilize projects like Lucene.

ohnonot commented 5 years ago

That's the reason why projects like Lucene exist, implementing a search in a feature-rich but still performant manner isn't trivial. This isn't possible with pure Twig, you'll have to write a plugin. Instead of re-inventing the wheel, it's probably best do utilize projects like Lucene.

Apache Lucene? Isn't that way too much? The search is trivial, and it is already happening in a satisfactory way, all i'm asking about is pagination, and not repeating something that's been done already...

Anyhow, my question is answered. Maybe one day I'll start making plugins.

And thanks for being so active here almost single-handedly!

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two days if no further activity occurs. Thank you for your contributions! :+1: