Improved resilience by outputting more data in the HTML

dracos commented 6 years ago

Working on a JKAN-based site, I was disappointed to discover that the site required JavaScript to show anything under /datasets/ and much of /organizations/ even though the pages consist of basic HTML, which is being generated by a static site generator that could therefore be generating actual HTML for all these pages. Yet the site is not using JavaScript to do things I would expect it to e.g. filtering the datasets page without a server-side request – each filter change causes a full page reload, which might as well then be including the requested data.

Behaving this way makes the site slower (instead of just showing me the dataset or list, it shows an empty page that has to wait for the JS to download and perhaps fetch more data in order to show anything), it makes it less available - https://kryogenix.org/code/browser/why-availability/ - and it makes it less resilient - https://kryogenix.org/code/browser/everyonehasjs.html . In some cases, the HTML includes the content, it's just hidden in CSS unless JavaScript is available, which didn't make much sense to me.

This PR does the following:

Makes the data on individual dataset and organization pages visible by default;
Outputs the count and list of datasets on organization pages in the HTML (the JS still kicks in and fetches them, which I left in order for the search to function, though that could be wired up to the existing content instead in future, perhaps);
Outputs categories, organizations and the datasets on the /datasets/ page, in the same order the JavaScript will then display them in (even duplicating the 'Show more' functionality to prevent any flash of changed content, though I think I might only have done it on categories, not organizations);
The search box is only shown if JavaScript is available, so as not to disappoint non-JS users.

This covers most of the functionality I could see; the main one remaining would be that the category sidebar does nothing still without JavaScript, but at least lists the categories. Options there include hiding it unless JavaScript is present, or using a collection or plugin to generate per-category static pages that could be used. The combined category+organization filter I imagine could remain JS-only if there were no solution there.

This PR also fixes an if statement that meant categories were only output if the dataset had a note.

timwis commented 6 years ago

@dracos top job! And you're right. The javascript should progressively enhance, or it might as well be a full-blown SPA. I do recall, however, trying to do it entirely in jekyll/liquid first, and ended up using JavaScript because I determined it wasn't possible. I believe it had to do with datasets having more than one category associated with them, and not being able to get accurate totals from that with liquid filters. Does this PR account for that? Do you have a demo site standing up already? If not, I can download your fork locally. This is a sizable PR, but at first glance it looks very well done so I can't imagine not merging it. Just want to make sure it's backward-compatible. Thanks for taking the time.

dracos commented 6 years ago

@timwis Thanks! I'm afraid I don't have a demo site at present, sorry, the site I was writing it for isn't updated yet. I have run it locally and the totals match what is then loaded with JS – it should cope fine with datasets in more than one category, the total for each category is done with the {% assign c = site.datasets | where:"category", category | size %} line (I think you wrote the PR to Jekyll a couple of years back that added that :) ).

dracos commented 6 years ago

Oh, and if the PR is too large, each commit is independent and each improves it in one aspect of the whole, so feel free to take one at a time if that's preferable.

timwis commented 6 years ago

@dracos please pardon my delay -- I was out of town last weekend and didn't have the time I expected to dive in. I copied the _datasets directory from jkan-demo to simulate a large collection of datasets, and it looks great! The dataset list and dataset detail page should work without JS, and you fixed that. However, there seems to be a slight glitch with the way the categories panel is being generated by liquid/jekyll: the counts differ from what JS counts, and the "uncategorized" category is at the bottom.

HTML-rendered	JS-rendered

As a result, when JavaScript is enabled, you get a bit of a jittery experience as the JS hydrates the HTML:

jkan-dracos-categories-2

My guess is that this is related to the issue of multiple categories that I alluded to before (I swear I tried to do this in liquid originally), but perhaps it's due to another reason. Either way, I'd say if we can't get it to reflect the same as the JS will, perhaps we should omit the category column in the HTML render, or render the list with no text content to reduce jitter. What do you think?

Otherwise I'd say this is good to go! If you don't mind, I'll add you as a maintainer to the project so we're not relying on my not taking weekend getaways to get improvements like this merged! Thanks again for your contribution -- looking forward to finishing this up.

dracos commented 6 years ago

We have this running on a live service (it's at http://data.mysociety.org/datasets/ now ) and the numbers match precisely with no flashes when the JS hydrates; things are indeed in multiple categories. The code differs slightly, so I must have messed something up when creating this PR/ the code there causes a difference somehow. I can't say when I'll have time to have a look, but this is just to note it certainly should work okay. I wouldn't want to omit the category column in HTML if at all possible, and I thought I'd achieved the necessary, sorry. I'll take a look at the jkan demo datasets and see what's going on when I can.

dracos commented 6 years ago

Difference appears to be the demo dataset mixes string and array categories, which I hadn't considered. This should be work-aroundable.

lukemckinstry commented 1 year ago

These changes seem to allow the categories menu on the datasets page to work as intended when JS is disabled! But it does look like the organizations section changes without JS.

JS enabled jkan-139-js-enabled

JS disabled jkan-139-js-disabled

timwis commented 1 year ago

Yeah, I noticed that at the weekend and was exploring why. It's because [datasets.html L31] passes the first 15 organisations (from a list that's unsorted), then sorts those by dataset count. If other_orgs contain organisations with more datasets, the JavaScript will sort them above those first 15.

The nice thing about this list, though, is the logic should be able to be replaced with a group_by filter, since a dataset only ever belongs to a single organisation.

timwis commented 1 year ago

Okay, I've just pushed up the group_by fix I mentioned: a5fc771.

This gets rid of the jitter you experienced. There is a bit of jitter that remains: when two organisations have the same number of datasets, sometimes Liquid sorts those in a different order than the JavaScript does. There doesn't seem to be a way to sort on multiple keys in Liquid, so we may have to live with this for now (unless you've any ideas).

The only issue with this approach is that it groups by the organization property of the datasets, rather than grouping by the list of organizations. The nice thing about this is it would omit organizations with 0 datasets, which makes sense for a filter list. The downside of this is that if two datasets spell the organization slightly differently (e.g. Water Department vs Water department), the liquid filter won't group them together. This isn't a problem if they're editing with the UI (as it's a drop-down), but if they're editing YAML files by hand, this could be an issue. Although, to be fair, I think this issue would have existed with the previous approach to grouping as well (it just wouldn't have listed both variations in the filter list).

EDIT: I think we could improve this further with group_by_exp

BryanQuigley commented 1 year ago

I see two issues remaining, both on the /datasets/ page

A "Show 1 more..." that does nothing under Categories.
Organization filters/links don't go anywhere

I tested by deleting the JavaScript from scripts/dist/ and copying over from the example datasets/orgs. Everything else tested worked fine for me.

BryanQuigley commented 1 year ago

Categories from the main page of JKAN don't work without javascript.

I still think it might make more sense to start with Luke's PR so we ensure things work without javascript and then can add optional javascript for things like search.

timwis commented 1 year ago

So I've been stewing on this one for a couple days, and I think there's a few pieces of this puzzle:

How we sort the list of organizations and categories by their dataset counts
- There's a bug in the sorting logic used in this PR: it slices and then sorts each slice; it should sort the whole set before slicing.
- Liquid's out-of-the-box limitations require we resort to pretty hacky methods to do this. How much easier would it be if we use a Jekyll plugin instead? (requires #199)
How we generate the category pages
- We've already had to initiate a breaking change to _data/categories.yml in #207 (wrap it in an items property). It wouldn't be a much bigger step to go all the way to a _categories collection. The only thing I worry about here is whether it will conflict with Jekyll's built-in 'categories' concept (relating to the special posts collection).
- I don't believe the ruby script is necessary to work out the list of datasets for a given category anymore (the string vs array issue seems to be resolved)
How we progressively enhance the datasets page
- Is it more important to eliminate jitter or to ensure an optimal experience for users without JavaScript? (e.g. if the former, we'd have a non-working 'Show more...' link like in this PR; if the latter, we'd probably not hide the overflow at all)
- If we add a /categories page linked to in the nav bar (next to /organizations), could we just hide the filter bar when JS is disabled? How important is it to show the counts? Perhaps this gets it across the release line?

If this is blocking the OpenDataPhilly work, then I propose we:

Migrate _data/categories.yml into a _categories collection and use Luke's categories.html layout to render pages for each category (but using liquid filters to get the list of datasets for the category).
Add the categories index to the nav bar and hide the filter bar completely when JavaScript is disabled (until 1 and 3 are figured out)

If not, then I propose we:

Use a Jekyll plugin to add a datasets property to the organizations and categories collections, which would make 1 trivial (e.g. site.organizations | sort: "datasets_count")
If we get it working, prioritise #199 so we can use it
From there, we'd just have to decide on 3. (Some ideas: use a <details> element for a non-JS expandable list; Make 'Show more...' link to /organizations//categories)

BryanQuigley commented 1 year ago

The generate_categories.rb script was from mysociety and then from before that it was based off of https://github.com/mnyrop/pagemaster this Jekyll plugin.

It sounds like a Jekyll plugin is the path you would ideally recommend going forward but might take a bit more time. If so, I'll see if we can invest some time in it. We have been blocked on other things atm.

If we add a /categories page linked to in the nav bar (next to /organizations), could we just hide the filter bar when JS is disabled? How important is it to show the counts? Perhaps this gets it across the release line?

That works for me - but still need something to generate category pages. I'm not sure the counts are super essential - but all solutions have them working at this time, right?

timwis commented 1 year ago

but still need something to generate category pages

I don't believe we do - we can just manually create a yml file for each category (instead of a single yml file with a list of categories, each category is its own file). The list of datasets for that category can be generated with liquid filters. Unless I'm missing something?

The plug-in I recommended would be for more easily computing the dataset counts and sorting the list of categories by those counts, but that's only necessary for the filter list. I can work on the plug-in this week, but again that shouldn't be necessary to have category pages.

timwis commented 1 year ago

Here's the plugin:

module DatasetCounts
  class Generator < Jekyll::Generator
    include Jekyll::Utils

    def generate(site)
      datasets = site.collections["datasets"].docs
      datasets_by_org = datasets.group_by { |dataset| slugify(dataset.data["organization"]) }

      site.collections["organizations"].docs.each do |organization|
        org_slug = slugify(organization.data["title"])
        org_datasets = datasets_by_org[org_slug]

        organization.data["datasets"] = org_datasets
        organization.data["datasets_size"] = org_datasets.size
      end

      datasets_by_category = datasets.each_with_object({}) do |dataset, memo|
        categories = Array(dataset.data["category"])
        categories.each do |category|
          slug = slugify(category)
          memo[slug] ||= []
          memo[slug] << dataset
        end
        memo
      end

      site.data["categories"].each do |category|
        category_slug = slugify(category["name"])
        category_datasets = datasets_by_category[category_slug]

        category["datasets"] = category_datasets
        category["datasets_size"] = category_datasets.size
      end
    end
  end
end

We can then generate the filter panels in liquid via:

{% assign sorted_categories = site.data.categories | sort: "name" | reverse | sort: "datasets_size" | reverse %}
{% assign sorted_orgs = site.organizations | sort: "title" | reverse | sort: "datasets_size" | reverse %}

(Note that the double sorting is necessary to get it to sort items with the same datasets_size in alphabetical order, a consequence of liquid's sort filter only working on one property at a time.)

We could then very easily access the datasets from layouts/organizations.html or layouts/categories.html via {% for dataset in datasets %}, but again, this plugin isn't necessary for this part. We already access the relevant datasets in layouts/organizations.html. For layouts/categories.html, we'd need to do something like:

{% assign datasets = site.datasets | where: "category", name %}

The only advantage of using this plugin for that is that it would deal with typos better, because it converts organisations/categories to slugs before grouping them.

BryanQuigley commented 1 year ago

I had somehow not thought about just making the categories there on pages. That makes things much easier.

Is there any negatives I'm missing about doing the Jekyll plugin (aside from requiring Github Actions or Netlify builds)

timwis commented 1 year ago

Is there any negatives I'm missing about doing the Jekyll plugin (aside from requiring Github Actions or Netlify builds)

The only negatives I can think of are:

We'll need to get rid of the GitHub Pages Gem (and pull in the pieces of it we need explicitly) because it disables other plugins (though we'd eventually need to do this to upgrade jekyll anyway)
It's not obvious in the liquid markup that that line relies on a plugin, which could make debugging harder if something ever goes wrong with it
We'll be relying on GitHub Actions and Netlify's free tiers remaining free, though have been relying on GitHub Pages being free as well
There's a chance we could be opening flood gates, and later find ourselves in a place where we have too many plugins, though it's hard for me to imagine this materialising
Plugins theoretically add to the build time, but we're talking fractions of a second with this one of course.

timwis commented 1 year ago

Gosh, liquid can be frustrating! I realised that it's no trouble at all to display dataset counts on the filter bar; the problem is just ordering that filter bar by those counts.

    <h3>Categories</h3>
    <div class="list-group" data-component="categories-filter" data-show="5">
      {% for category in site.dataset_categories %}
      {% assign count = site.datasets | where: "category", category.name | size %}
      <a href="{{ site.baseurl }}{{ category.url }}" class="list-group-item">
        <li class="d-flex justify-content-between align-items-center">
          {{ category.name }}
          <span class="badge badge-primary badge-pill">{{ count }}</span>
        </li>
      </a>
      {% endfor %}
    </div>

Screenshot 2023-02-06 at 08 20 42

If there were a way to sort with CSS, we'd be done. The problem is we can't assign that count to the category object, so we can't then sort the list of categories by that count property. The only way seems to be hacky approaches, or the plugin approach (which, again, I think is fine, but somehow my brain can't let go of this!).

timwis commented 1 year ago

Closed by #237, which includes cherry-picked commits from this PR. Thanks again @dracos for starting us on this direction, and awfully sorry it took so long to merge your commits!

timwis / jkan

Improved resilience by outputting more data in the HTML #139