picocms / Pico

Pico is a stupidly simple, blazing fast, flat file CMS.
http://picocms.org/
MIT License
3.81k stars 616 forks source link

Visit counter #687

Closed digitalinferno closed 3 months ago

digitalinferno commented 5 months ago

I would like to create a plugin to keep track (approximately) of the number of visits to one or more pages. I have two approaches in mind: the first one involves creating a file mypage.txt for the mypage.md file, and the second one involves working directly with the meta tags of mypage.md.

Title: Home
hidden: false
Description: home page
Template: index
Counter: yes
visits: 234
---

With the first approach, aside from paying attention to concurrent visitors, I don't see any particular issues. However, I would appreciate some suggestions on working with meta tags. This allow the use of visits for sorting purposes.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two days if no further activity occurs. Thank you for your contributions! :+1:

PhrozenByte commented 5 months ago

I see a third alternative as well:

  1. Add the visit counter to a page-specific statistics file (e.g. mypage.txt)
  2. Add it to a page's YAML Front Matter (e.g. mypage.md)
  3. Use a common statistics file (e.g. stats.md, possibly as inaccessible and hidden file, i.e. _stats.md)

The best approach depends on what you want to do with that information and how many concurrent page views you expect. Remember to use file locking: You can't write to a file concurrently, i.e. you must delay parallel requests when writing the updated visit count. If you don't expect tens of thousands accesses per day, I wouldn't expect noticeable slowdowns in practice, thus I'd recommend (3) - this way everything is at one place and you don't have to fiddle with pages, which is slow with large pages and bears the risk of accidentally overwriting contents. It also works well with whatever you plan to do with that information, since it's a regular page and the info about all pages is available in any request. If you expect many parallel requests, I'd rather go with (1), unless you need to access the stats of other pages when viewing a page - then I'd go with (2) since it spares a separate file operation.

digitalinferno commented 5 months ago

Thank you for the suggestion! How would you proceed with managing the data structure in the stats.md file? Something like pageid: visits?

---
index.md: 101
mypage.md: 33
mypage2.md: 58
---
PhrozenByte commented 5 months ago

Something like pageid: visits?

Exactly. Check whether YAML mapping keys have character restrictions and whether quoting helps, otherwise you must encode special chars. It might be a good idea to use a library to create the YAML (AFAIK Symfony YAML can also write YAML files), which might already take care of this.

digitalinferno commented 5 months ago

This is my twig:

{% for key, stat in pages["_stats"].meta|sort|reverse %}
{{ key }} - {{ stat }}
{% endfor %}    

and this is my _stat.md:

---
page1: 100
page2: 200
page3: 150
---

The output is:

page2 - 200

page3 - 150

page1 - 100

rssfeed -

filter -

tags -

social -

tagline -

assets/logo.svg -

hidden -

template -

robots -

time -

date_formatted -

date -

author -

description -

title -

What did I do wrong?

PhrozenByte commented 5 months ago

Pico as well as some plugins auto-create meta headers, thus you should probably move the stats into an sub-key, like

---
stats:
  page1: 100
  page2: 200
  page3: 150
---
digitalinferno commented 5 months ago

Perfect, now the goal is to read the MD file with Symfony YAML, convert the text into an array, manipulate it, and save the changes. I already have a working prototype in plain txt.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two days if no further activity occurs. Thank you for your contributions! :+1:

digitalinferno commented 5 months ago

I found a strange behavior in the generation of the id.. this is my code (WIP):

class PicoVisits extends AbstractPicoPlugin 
{
    const API_VERSION = 3;
    protected $enabled = true;
    protected $dependsOn = array();

    private function getStats()
    {

        $md = './content/_stats.md'; // file name
        $counter = 0; // var init

        // check for md file
        if(!file_exists($md)) {
            $file = fopen($md, 'w');
            if(flock($file, LOCK_EX)) {
                fwrite($file, 'TODO');
                flock($file, LOCK_UN);
                fclose($file);
            }
        } else {      
            $file = fopen($md, 'r+');
            if (flock($file, LOCK_EX)) {
                $match = false;
                $data_array = [];

                while (!feof($file)) {
                    $line = fgets($file);
                    $values = explode (':' , $line, 2);
                    // double check
                    if (count($values) === 2) {
                        $id = trim($values[0]);
                        $value = trim($values[1]);
                        // find a match
                        if (strcmp(trim($this->id), $id) == 0) {
                            $value = intval($value) + 1;
                            $counter = $value;
                            $match = true;
                        }

                        $data_array[$id] = $value;
                    }
                }
            // Not match found
            if ($match == 0) { 
                $data_array[$this->id] = 1;
                $counter = 1;
            }

            // Update file
            fseek($file, 0);
            ftruncate($file, 0);
            foreach ($data_array as $id => $value) {
                fwrite($file, "$id:$value\n");
            }

            flock($file, LOCK_UN);
            fclose($file); // done

            } else {
                fclose($file);
                die;
            }       
        }

        return ($counter);
    }

    public function onPageRendering(&$templateName, array &$twigVariables)
    {
        $this->id = $this->pico->getCurrentPage()['id'];
        $twigVariables['PageStats']  = $this->getStats();
    }

}

Everything is ok, it returns something like this (as expected)

about:15
blog/2024/test-page-2:10
blog/2024/test-page-1:13

But, if into the tpl file there is somethig like that (I tried all the combinations, adding and removing the various {{ }} ):

{% if meta.author %}
    <img src="assets/header.jpg" class="img-fluid" alt="">
            <div class="infopost clearfix">
            <p class="float-start"><i class="bi bi-pencil-square"></i> {{ meta.author}} - {{ meta.date|date('d M Y') }}</p>                            
            <p class="float-end"> Tempo di lettura {{ ReadingTime }} min <i class="bi bi-clock"></i></p>
            </div>
{% endif %}

The script add an empity line, with incremental value :1

about:15
blog/2024/test-page-2:10
blog/2024/test-page-1:13
blog/2024/test-page-badtwig:2
blog/2024/test-page-badtwig2:5
:7

Any comment is apprecied. Thanks.

PhrozenByte commented 5 months ago

Don't write your own YAML dumper, better use a common library. Pico already ships Symfony YAML (as of version 2.8 - pretty old, but doesn't really matter, see https://symfony.com/doc/2.x/components/yaml.html#writing-yaml-files), so simply use that to create your YAML and write it to _stats.md (don't forget to add preliminary and trailing --- lines, otherwise it's not considered a YAML Front Matter).

Like the following (totally untested, needs verification, error handling, file locking, etc., just to show the basic approach):

$this->accessStats = $this->getPico()->getPages()["_stats"];
$this->accessStats[$currentPageId]++;
$yaml = Yaml::dump([ "stats" => $this->accessStats ]);
file_put_contents($fileName, "---\n" . $yaml . "---\n");

You then don't even need any custom code to read the YAML, just access the data via {{ pages["_stats"].meta.stats }}.

digitalinferno commented 5 months ago

I'm a noob... :cold_sweat: Thanks for the code, now I'm on the right way.

PS: The bug I was telling you about earlier also occurs with the new code...

In the end, I understood that the issue was this (from my tpl): <img src="assets/header.jpg" class="img-fluid" alt="image"> fixed with this: <img src="{{ blog.base_url }}/assets/header.jpg" class="img-fluid" alt="image">

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two days if no further activity occurs. Thank you for your contributions! :+1:

digitalinferno commented 4 months ago

Basically it works... just some issues with wrong links (404)

<?php

use Symfony\Component\Yaml\Yaml;

class PicoPageStats extends AbstractPicoPlugin
{
    const API_VERSION = 3;
    protected $enabled = true;
    protected $dependsOn = array();
    private $accessStats = array();

    public function onPageRendering(&$templateName, array &$twigVariables)
    {
        $md = './content/_stats.md'; // file name
        $currentPageId = $this->getPico()->getCurrentPage()['id'];
        if (!empty($currentPageId)) { 
            $this->loadStats($md);
            if (!isset($this->accessStats[$currentPageId])) {
                $this->accessStats[$currentPageId] = 0;
            }    
            $this->accessStats[$currentPageId]++;
            $this->saveStats($md);
        }
    }

    private function loadStats($md)
    {
        $file = fopen($md, 'c+');      
        if ($file) {
            if (flock($file, LOCK_SH)) {
                $pageContent = file_get_contents($md);
                $currentPageId = $this->pico->getCurrentPage()['id'];
                // front matter YAML
                preg_match('/^---(.*?)---/s', $pageContent, $matches);
                if (isset($matches[1])) {
                    $frontMatter = trim($matches[1]);
                    $frontMatterArray = Yaml::parse($frontMatter);
                    $this->accessStats = isset($frontMatterArray['stats']) ? $frontMatterArray['stats'] : array();
                } else {
                    $this->accessStats[$currentPageId] = isset($this->accessStats[$currentPageId]) ? $this->accessStats[$currentPageId] + 1 : 1;
            }
                flock($file, LOCK_UN);
            }
            fclose($file);
        }
    }

    private function saveStats($md)
    {
        $file = fopen($md, 'r+');    
        if ($file) {
            if (flock($file, LOCK_EX)) {
                // update the front matter
                $frontMatterArray = array('stats' => $this->accessStats);
                // $pageContent = file_get_contents($md);
                $pageContent = preg_replace('/^---(.*?)---/s', '', '');
                $yaml = "---\n" . Yaml::dump($frontMatterArray) . "---\n" . $pageContent;
                file_put_contents($md, $yaml);
                flock($file, LOCK_UN);
            }
            fclose($file);
        }
    }

}
?>

Hints? Maybe I will add an option to save only the last 7 days stats.

PhrozenByte commented 4 months ago

just some issues with wrong links (404)

What are the issues?

The only error I see is with $currentPageId = $this->getPico()->getCurrentPage()['id']; throwing a PHP error when getCurrentPage() returns null (which it might for non-existing or "virtual" pages created by plugins; use $currentPage = $this->getPico()->getCurrentPage(); if ($currentPage !== null) { … } instead).

Some more suggestions (some are rather academic, i.e. have no real practical advantage, i.e. totally up to you):

digitalinferno commented 4 months ago

What are the issues?

Solved, thanks. The issue occurs specifically when the getCurrentPage() returns null.

  • You might read & write the stats file in one go

In my mind, the shared lock for reading and the exclusive lock for writing seemed like a classy touch.. :smile:

  • You might hook into onPagesLoading

Despite your suggestions, I'm far from finding a suitable solution for this:

<?php

class PicoPageStats extends AbstractPicoPlugin
{
    const API_VERSION = 3;
    protected $enabled = true;
    protected $dependsOn = array();
    private $accessStats = array();

    public function onPageRendering(&$templateName, array &$twigVariables)
    {
        $md = './content/_stats.md'; // file name
        $currentPage = $this->getPico()->getCurrentPage();
        if ($currentPage !== null) {
            $file = fopen($md, 'c+');
            if (flock($file, LOCK_EX)) {
                $currentPageId = $this->getPico()->getCurrentPage()['id'];
                $this->loadStats($file);
                if (!isset($this->accessStats[$currentPageId])) {
                    $this->accessStats[$currentPageId] = 0;
                }    
                $this->accessStats[$currentPageId]++;
                $this->saveStats($file);
                flock($file, LOCK_UN);
            }
            fclose($file);
        }
    }

    private function loadStats($file)
    {
        $md = './content/_stats.md'; // file name
        $currentPageId = $this->pico->getCurrentPage()['id'];
        $headers = array();
        // front matter YAML
        $pageContent = filesize($md) > 0 ? fread($file, filesize($md))  : '';
        $frontMatterArray= $this->getPico()->parseFileMeta($pageContent, $headers);
        //print_r ($matches[1]);
        if (isset($frontMatterArray['stats'])) {
            $this->accessStats = isset($frontMatterArray['stats']) ? $frontMatterArray['stats'] : array();
            } else {
            $this->accessStats[$currentPageId] =  0;
        }
    }

    private function saveStats($file)
    {
        // update the front matter
        $frontMatterArray = array('stats' => $this->accessStats);
        $yaml = "---\nstats:\n";
        foreach ($frontMatterArray['stats'] as $key => $value) {
            $yaml .= "  " . $key . ": " . $value . "\n";
        }
        $yaml .= "---\n";
        rewind($file);
        fwrite($file, $yaml);
    } 
}
?>
PhrozenByte commented 4 months ago
digitalinferno commented 4 months ago
  • Any particular reason you dropped Yaml::dump()?

No, just some Frankenstein code...

  • You do $this->accessStats[$currentPageId] = 0; twice in your code

Yes... but in two different circumstances I had to patch things up. I try to be cleaner.

digitalinferno commented 4 months ago

I didn't fully understand this step:

You might hook into onPagesLoading (or earlier) to read & write the stats file - this way you can hook into onSinglePageLoading and prevent Pico from reading _stats.md (set $skipFile = true) and then add that page with the parsed YAML you've read earlier in onPagesDiscovered yourself; this saves Pico to read the already read file again and prevents concurrency issues when Pico calls file_get_contents() (namely: Pico's file_get_contents() will soft-fail on an exclusively locked file and return an empty string)


public function onPagesLoading()
{
// read & write the stats file
}
public function onSinglePageLoading($id, &$skipPage)
{
    $id = '_stats.md';
    $skipFile = true;
}

public function onPagesDiscovered(array &$pages)
{
    // add that page with the parsed YAML
}

Can you give me some further advice?
PhrozenByte commented 4 months ago

In onSinglePageLoading() you rather do if ($id === '_stats.md') { $skipFile = true; }. This tells Pico to simply ignore that page, as if the page never existed. You add that page in onPagesDiscovered() later.

To onPagesLoading() you move the same logic as you currently have in your onPageRendering() hook. The only difference is that instead of doing $this->accessStats = $frontMatterArray['stats'] ?? []; in loadStats(), you do $this->statsPageMeta = $frontMatterArray; (i.e. store the whole page's meta data) and use $this->statsPageMeta in your code.

In onPagesDiscovered you then just do $pages['_stats'] = [ 'title' => &$this->statsPageMeta['title'], /* more … */ 'meta' => &$this->statsPageMeta ]; (see here for the other array keys required, you simply copy that part and use $this->statsPageMeta instead of $meta)

digitalinferno commented 4 months ago

Thanks, very kind. But when I copy $currentPage = $this->getPico()->getCurrentPage(); to onPagesLoading() I get an empty value... I must hook on onCurrentPageDiscovered Am I doing something wrong?

digitalinferno commented 4 months ago

Two issues:

  1. Compatibility with other plugins: In my case, I have the tags plugin installed, and it stopped working. I can fix it with 'tags' => &$this->statsPageMeta['tags'].
  2. Empty page in pages(): So {% for page in pages() if not page.hidden %}…{% endfor %} returns an empty value, i can fix with 'hidden' => 'true'

This is my code:

<?php

use Symfony\Component\Yaml\Yaml;

class PicoPageStats extends AbstractPicoPlugin
{
    const API_VERSION = 3;
    protected $enabled = true;
    protected $dependsOn = array();
    protected $md = './content/_stats.md'; // stats here
    private $statsPageMeta = array();

    public function onSinglePageLoading($id, &$skipPage)
    {
        if ($id === '_stats') { $skipFile = true; }
    }    

    public function onCurrentPageDiscovered(
        array &$currentPage = null,
        array &$previousPage = null,
        array &$nextPage = null
    ) {
        $currentPage = $this->getPico()->getCurrentPage();
        if ($currentPage !== null) {
            $file = fopen($this->md, 'c+');
            if (flock($file, LOCK_EX)) {
                $currentPageId = $this->getPico()->getCurrentPage()['id'];
                $this->loadStats($file);
                $this->statsPageMeta['stats'][$currentPageId]++;
                $this->saveStats($file);
                flock($file, LOCK_UN);
            }
            fclose($file);
        }
    }

    public function onPagesDiscovered(array &$pages)
    {
        $pages['_stats'] = [
            'id' => &$this->statsPageMeta['id'], // or 'id' => '_stats', fix 2
            'url' => &$this->statsPageMeta['url'],
            'title' => &$this->statsPageMeta['title'],
            'description' => &$this->statsPageMeta['description'],
            'author' => &$this->statsPageMeta['author'],
            'time' => &$this->statsPageMeta['time'],
            'hidden' => 'true', // fix 2
            'tags' => &$this->statsPageMeta['tags'], // fix 1
            'date' => &$this->statsPageMeta['date'],
            'date_formatted' => &$this->statsPageMeta['date_formatted'],
            'raw_content' => &$rawContent,
            'meta' => &$this->statsPageMeta
        ];
    }

    private function loadStats($file)
    {
        $currentPageId = $this->pico->getCurrentPage()['id'];
        // front matter YAML
        $frontMatter = stream_get_contents($file);
        $frontMatterArray= $this->getPico()->parseFileMeta($frontMatter, []);        
        $this->statsPageMeta = $frontMatterArray;
        $this->statsPageMeta['stats'] = $this->statsPageMeta['stats'] ?? [];
        if (!array_key_exists($currentPageId, $this->statsPageMeta['stats'])) {
            $this->statsPageMeta['stats'][$currentPageId] = 0;
        }                
    }

    private function saveStats($file)
    {
        // update the front matter
        $frontMatterArray = array('stats' => $this->statsPageMeta['stats']);
        $yaml = "---\n" . Yaml::dump($frontMatterArray) . "---\n";
        file_put_contents($this->md, $yaml);
    } 
}
?>
PhrozenByte commented 4 months ago

But when I copy $currentPage = $this->getPico()->getCurrentPage(); to onPagesLoading() I get an empty value... I must hook on onCurrentPageDiscovered Am I doing something wrong?

No, my bad, you're right, you can't use getCurrentPage() in onPagesLoading because the current page wasn't discovered at this point. You can use onPagesLoading if you use the following instead to discover the current page: $currentPageId = $this->getPico()->getPageId($this->getPico()->getRequestFile() ?? ''); (this works because you really just need the page ID, not the actual page data).

Compatibility with other plugins: In my case, I have the tags plugin installed, and it stopped working.

I'm a bit speculating here, but I assume that's due to execution order. Note that onCurrentPageDiscovered is run after onPagesDiscovered, i.e. at the moment onPagesDiscovered is called, $this->statsPageMeta is an empty array. I assume that this confuses the tags plugin. By using onPagesLoading instead (see above) you restore the correct execution order and the tags plugin should work fine.

This brings me to another issue: Right now you never load the stats page if you couldn't discover $currentPageId. You must always load the stats file and set $this->statsPageMeta. The only two instructions that truly depend on $currentPageId are $this->statsPageMeta['stats'][$currentPageId]++; and $this->saveStats($file);. Everything else must always be run.

Empty page in pages(): So {% for page in pages() if not page.hidden %}…{% endfor %} returns an empty value, i can fix with 'hidden' => 'true'

I don't really understand what "empty page" means. However, you must add 'hidden' => true in any case (note that true shouldn't be quoted!)

'id' => &$this->statsPageMeta['id'], // or 'id' => '_stats', fix 2

This should be 'id' => '_stats' instead.

'raw_content' => &$rawContent,

This should be 'raw_content' => null instead.

file_put_contents($this->md, $yaml);

You already have an open $file handle, use that instead. Just use fwrite() and rewind() as you already did earlier.

digitalinferno commented 4 months ago

I'm a bit speculating here, but I assume that's due to execution order.

Nothing changed. From the Tags plugin onMetaParsed:

        foreach ($pages as $page) {
            $tags = PicoTags::parseTags($page['meta']['tags']);
            ...

I will try to port it to API v3 to see if there are any improvements.

This brings me to another issue: Right now you never load the stats page if you couldn't discover $currentPageId

And if I have a wrong link, the plugin will update the statistics for the non-existent ID.

I don't really understand what "empty page" means.

If I have 3 pages, the loop {% for page in pages() if not page.hidden %}…{% endfor %} (if not 'hidden' => true) returns:

  1. index
  2. page1
  3. page2

The first value is not expected at all (if I set 'url' => something the link poit to something). I thought that just the underscore in the filename would be enough.

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two days if no further activity occurs. Thank you for your contributions! :+1: