timwis / jkan

A lightweight, backend-free open data portal, powered by Jekyll
https://jkan.io
MIT License
219 stars 309 forks source link

Improved resilience by outputting more data in the HTML #139

Closed dracos closed 1 year ago

dracos commented 6 years ago

Working on a JKAN-based site, I was disappointed to discover that the site required JavaScript to show anything under /datasets/ and much of /organizations/ even though the pages consist of basic HTML, which is being generated by a static site generator that could therefore be generating actual HTML for all these pages. Yet the site is not using JavaScript to do things I would expect it to e.g. filtering the datasets page without a server-side request – each filter change causes a full page reload, which might as well then be including the requested data.

Behaving this way makes the site slower (instead of just showing me the dataset or list, it shows an empty page that has to wait for the JS to download and perhaps fetch more data in order to show anything), it makes it less available - https://kryogenix.org/code/browser/why-availability/ - and it makes it less resilient - https://kryogenix.org/code/browser/everyonehasjs.html . In some cases, the HTML includes the content, it's just hidden in CSS unless JavaScript is available, which didn't make much sense to me.

This PR does the following:

This covers most of the functionality I could see; the main one remaining would be that the category sidebar does nothing still without JavaScript, but at least lists the categories. Options there include hiding it unless JavaScript is present, or using a collection or plugin to generate per-category static pages that could be used. The combined category+organization filter I imagine could remain JS-only if there were no solution there.

This PR also fixes an if statement that meant categories were only output if the dataset had a note.

timwis commented 6 years ago

@dracos top job! And you're right. The javascript should progressively enhance, or it might as well be a full-blown SPA. I do recall, however, trying to do it entirely in jekyll/liquid first, and ended up using JavaScript because I determined it wasn't possible. I believe it had to do with datasets having more than one category associated with them, and not being able to get accurate totals from that with liquid filters. Does this PR account for that? Do you have a demo site standing up already? If not, I can download your fork locally. This is a sizable PR, but at first glance it looks very well done so I can't imagine not merging it. Just want to make sure it's backward-compatible. Thanks for taking the time.

dracos commented 6 years ago

@timwis Thanks! I'm afraid I don't have a demo site at present, sorry, the site I was writing it for isn't updated yet. I have run it locally and the totals match what is then loaded with JS – it should cope fine with datasets in more than one category, the total for each category is done with the {% assign c = site.datasets | where:"category", category | size %} line (I think you wrote the PR to Jekyll a couple of years back that added that :) ).

dracos commented 6 years ago

Oh, and if the PR is too large, each commit is independent and each improves it in one aspect of the whole, so feel free to take one at a time if that's preferable.

timwis commented 6 years ago

@dracos please pardon my delay -- I was out of town last weekend and didn't have the time I expected to dive in. I copied the _datasets directory from jkan-demo to simulate a large collection of datasets, and it looks great! The dataset list and dataset detail page should work without JS, and you fixed that. However, there seems to be a slight glitch with the way the categories panel is being generated by liquid/jekyll: the counts differ from what JS counts, and the "uncategorized" category is at the bottom.

HTML-rendered JS-rendered
screen shot 2018-04-14 at 2 09 43 pm screen shot 2018-04-14 at 2 09 55 pm

As a result, when JavaScript is enabled, you get a bit of a jittery experience as the JS hydrates the HTML:

jkan-dracos-categories-2

My guess is that this is related to the issue of multiple categories that I alluded to before (I swear I tried to do this in liquid originally), but perhaps it's due to another reason. Either way, I'd say if we can't get it to reflect the same as the JS will, perhaps we should omit the category column in the HTML render, or render the list with no text content to reduce jitter. What do you think?

Otherwise I'd say this is good to go! If you don't mind, I'll add you as a maintainer to the project so we're not relying on my not taking weekend getaways to get improvements like this merged! Thanks again for your contribution -- looking forward to finishing this up.

dracos commented 6 years ago

We have this running on a live service (it's at http://data.mysociety.org/datasets/ now ) and the numbers match precisely with no flashes when the JS hydrates; things are indeed in multiple categories. The code differs slightly, so I must have messed something up when creating this PR/ the code there causes a difference somehow. I can't say when I'll have time to have a look, but this is just to note it certainly should work okay. I wouldn't want to omit the category column in HTML if at all possible, and I thought I'd achieved the necessary, sorry. I'll take a look at the jkan demo datasets and see what's going on when I can.

dracos commented 6 years ago

Difference appears to be the demo dataset mixes string and array categories, which I hadn't considered. This should be work-aroundable.

lukemckinstry commented 1 year ago

These changes seem to allow the categories menu on the datasets page to work as intended when JS is disabled! But it does look like the organizations section changes without JS.

JS enabled jkan-139-js-enabled

JS disabled jkan-139-js-disabled

timwis commented 1 year ago

Yeah, I noticed that at the weekend and was exploring why. It's because [datasets.html L31] passes the first 15 organisations (from a list that's unsorted), then sorts those by dataset count. If other_orgs contain organisations with more datasets, the JavaScript will sort them above those first 15.

The nice thing about this list, though, is the logic should be able to be replaced with a group_by filter, since a dataset only ever belongs to a single organisation.

timwis commented 1 year ago

Okay, I've just pushed up the group_by fix I mentioned: a5fc771.

This gets rid of the jitter you experienced. There is a bit of jitter that remains: when two organisations have the same number of datasets, sometimes Liquid sorts those in a different order than the JavaScript does. There doesn't seem to be a way to sort on multiple keys in Liquid, so we may have to live with this for now (unless you've any ideas).

The only issue with this approach is that it groups by the organization property of the datasets, rather than grouping by the list of organizations. The nice thing about this is it would omit organizations with 0 datasets, which makes sense for a filter list. The downside of this is that if two datasets spell the organization slightly differently (e.g. Water Department vs Water department), the liquid filter won't group them together. This isn't a problem if they're editing with the UI (as it's a drop-down), but if they're editing YAML files by hand, this could be an issue. Although, to be fair, I think this issue would have existed with the previous approach to grouping as well (it just wouldn't have listed both variations in the filter list).

EDIT: I think we could improve this further with group_by_exp

BryanQuigley commented 1 year ago

I see two issues remaining, both on the /datasets/ page

I tested by deleting the JavaScript from scripts/dist/ and copying over from the example datasets/orgs. Everything else tested worked fine for me.

BryanQuigley commented 1 year ago

Categories from the main page of JKAN don't work without javascript.

I still think it might make more sense to start with Luke's PR so we ensure things work without javascript and then can add optional javascript for things like search.

timwis commented 1 year ago

So I've been stewing on this one for a couple days, and I think there's a few pieces of this puzzle:

  1. How we sort the list of organizations and categories by their dataset counts

    • There's a bug in the sorting logic used in this PR: it slices and then sorts each slice; it should sort the whole set before slicing.
    • Liquid's out-of-the-box limitations require we resort to pretty hacky methods to do this. How much easier would it be if we use a Jekyll plugin instead? (requires #199)
  2. How we generate the category pages

    • We've already had to initiate a breaking change to _data/categories.yml in #207 (wrap it in an items property). It wouldn't be a much bigger step to go all the way to a _categories collection. The only thing I worry about here is whether it will conflict with Jekyll's built-in 'categories' concept (relating to the special posts collection).
    • I don't believe the ruby script is necessary to work out the list of datasets for a given category anymore (the string vs array issue seems to be resolved)
  3. How we progressively enhance the datasets page

    • Is it more important to eliminate jitter or to ensure an optimal experience for users without JavaScript? (e.g. if the former, we'd have a non-working 'Show more...' link like in this PR; if the latter, we'd probably not hide the overflow at all)
    • If we add a /categories page linked to in the nav bar (next to /organizations), could we just hide the filter bar when JS is disabled? How important is it to show the counts? Perhaps this gets it across the release line?

If this is blocking the OpenDataPhilly work, then I propose we:

If not, then I propose we:

BryanQuigley commented 1 year ago

The generate_categories.rb script was from mysociety and then from before that it was based off of https://github.com/mnyrop/pagemaster this Jekyll plugin.

It sounds like a Jekyll plugin is the path you would ideally recommend going forward but might take a bit more time. If so, I'll see if we can invest some time in it. We have been blocked on other things atm.

If we add a /categories page linked to in the nav bar (next to /organizations), could we just hide the filter bar when JS is disabled? How important is it to show the counts? Perhaps this gets it across the release line?

That works for me - but still need something to generate category pages. I'm not sure the counts are super essential - but all solutions have them working at this time, right?

timwis commented 1 year ago

but still need something to generate category pages

I don't believe we do - we can just manually create a yml file for each category (instead of a single yml file with a list of categories, each category is its own file). The list of datasets for that category can be generated with liquid filters. Unless I'm missing something?

The plug-in I recommended would be for more easily computing the dataset counts and sorting the list of categories by those counts, but that's only necessary for the filter list. I can work on the plug-in this week, but again that shouldn't be necessary to have category pages.

timwis commented 1 year ago

Here's the plugin:

module DatasetCounts
  class Generator < Jekyll::Generator
    include Jekyll::Utils

    def generate(site)
      datasets = site.collections["datasets"].docs
      datasets_by_org = datasets.group_by { |dataset| slugify(dataset.data["organization"]) }

      site.collections["organizations"].docs.each do |organization|
        org_slug = slugify(organization.data["title"])
        org_datasets = datasets_by_org[org_slug]

        organization.data["datasets"] = org_datasets
        organization.data["datasets_size"] = org_datasets.size
      end

      datasets_by_category = datasets.each_with_object({}) do |dataset, memo|
        categories = Array(dataset.data["category"])
        categories.each do |category|
          slug = slugify(category)
          memo[slug] ||= []
          memo[slug] << dataset
        end
        memo
      end

      site.data["categories"].each do |category|
        category_slug = slugify(category["name"])
        category_datasets = datasets_by_category[category_slug]

        category["datasets"] = category_datasets
        category["datasets_size"] = category_datasets.size
      end
    end
  end
end

We can then generate the filter panels in liquid via:

{% assign sorted_categories = site.data.categories | sort: "name" | reverse | sort: "datasets_size" | reverse %}
{% assign sorted_orgs = site.organizations | sort: "title" | reverse | sort: "datasets_size" | reverse %}

(Note that the double sorting is necessary to get it to sort items with the same datasets_size in alphabetical order, a consequence of liquid's sort filter only working on one property at a time.)

We could then very easily access the datasets from layouts/organizations.html or layouts/categories.html via {% for dataset in datasets %}, but again, this plugin isn't necessary for this part. We already access the relevant datasets in layouts/organizations.html. For layouts/categories.html, we'd need to do something like:

{% assign datasets = site.datasets | where: "category", name %}

The only advantage of using this plugin for that is that it would deal with typos better, because it converts organisations/categories to slugs before grouping them.

BryanQuigley commented 1 year ago

I had somehow not thought about just making the categories there on pages. That makes things much easier.

Is there any negatives I'm missing about doing the Jekyll plugin (aside from requiring Github Actions or Netlify builds)

timwis commented 1 year ago

Is there any negatives I'm missing about doing the Jekyll plugin (aside from requiring Github Actions or Netlify builds)

The only negatives I can think of are:

timwis commented 1 year ago

Gosh, liquid can be frustrating! I realised that it's no trouble at all to display dataset counts on the filter bar; the problem is just ordering that filter bar by those counts.

    <h3>Categories</h3>
    <div class="list-group" data-component="categories-filter" data-show="5">
      {% for category in site.dataset_categories %}
      {% assign count = site.datasets | where: "category", category.name | size %}
      <a href="{{ site.baseurl }}{{ category.url }}" class="list-group-item">
        <li class="d-flex justify-content-between align-items-center">
          {{ category.name }}
          <span class="badge badge-primary badge-pill">{{ count }}</span>
        </li>
      </a>
      {% endfor %}
    </div>

Screenshot 2023-02-06 at 08 20 42

If there were a way to sort with CSS, we'd be done. The problem is we can't assign that count to the category object, so we can't then sort the list of categories by that count property. The only way seems to be hacky approaches, or the plugin approach (which, again, I think is fine, but somehow my brain can't let go of this!).

timwis commented 1 year ago

Closed by #237, which includes cherry-picked commits from this PR. Thanks again @dracos for starting us on this direction, and awfully sorry it took so long to merge your commits!