Closed dracos closed 1 year ago
@dracos top job! And you're right. The javascript should progressively enhance, or it might as well be a full-blown SPA. I do recall, however, trying to do it entirely in jekyll/liquid first, and ended up using JavaScript because I determined it wasn't possible. I believe it had to do with datasets having more than one category associated with them, and not being able to get accurate totals from that with liquid filters. Does this PR account for that? Do you have a demo site standing up already? If not, I can download your fork locally. This is a sizable PR, but at first glance it looks very well done so I can't imagine not merging it. Just want to make sure it's backward-compatible. Thanks for taking the time.
@timwis Thanks! I'm afraid I don't have a demo site at present, sorry, the site I was writing it for isn't updated yet. I have run it locally and the totals match what is then loaded with JS – it should cope fine with datasets in more than one category, the total for each category is done with the {% assign c = site.datasets | where:"category", category | size %}
line (I think you wrote the PR to Jekyll a couple of years back that added that :) ).
Oh, and if the PR is too large, each commit is independent and each improves it in one aspect of the whole, so feel free to take one at a time if that's preferable.
@dracos please pardon my delay -- I was out of town last weekend and didn't have the time I expected to dive in. I copied the _datasets
directory from jkan-demo to simulate a large collection of datasets, and it looks great! The dataset list and dataset detail page should work without JS, and you fixed that. However, there seems to be a slight glitch with the way the categories panel is being generated by liquid/jekyll: the counts differ from what JS counts, and the "uncategorized" category is at the bottom.
HTML-rendered | JS-rendered |
---|---|
As a result, when JavaScript is enabled, you get a bit of a jittery experience as the JS hydrates the HTML:
My guess is that this is related to the issue of multiple categories that I alluded to before (I swear I tried to do this in liquid originally), but perhaps it's due to another reason. Either way, I'd say if we can't get it to reflect the same as the JS will, perhaps we should omit the category column in the HTML render, or render the list with no text content to reduce jitter. What do you think?
Otherwise I'd say this is good to go! If you don't mind, I'll add you as a maintainer to the project so we're not relying on my not taking weekend getaways to get improvements like this merged! Thanks again for your contribution -- looking forward to finishing this up.
We have this running on a live service (it's at http://data.mysociety.org/datasets/ now ) and the numbers match precisely with no flashes when the JS hydrates; things are indeed in multiple categories. The code differs slightly, so I must have messed something up when creating this PR/ the code there causes a difference somehow. I can't say when I'll have time to have a look, but this is just to note it certainly should work okay. I wouldn't want to omit the category column in HTML if at all possible, and I thought I'd achieved the necessary, sorry. I'll take a look at the jkan demo datasets and see what's going on when I can.
Difference appears to be the demo dataset mixes string and array categories, which I hadn't considered. This should be work-aroundable.
These changes seem to allow the categories menu on the datasets page to work as intended when JS is disabled! But it does look like the organizations section changes without JS.
JS enabled
JS disabled
Yeah, I noticed that at the weekend and was exploring why. It's because [datasets.html L31] passes the first 15 organisations (from a list that's unsorted), then sorts those by dataset count. If other_orgs
contain organisations with more datasets, the JavaScript will sort them above those first 15.
The nice thing about this list, though, is the logic should be able to be replaced with a group_by
filter, since a dataset only ever belongs to a single organisation.
Okay, I've just pushed up the group_by
fix I mentioned: a5fc771
.
This gets rid of the jitter you experienced. There is a bit of jitter that remains: when two organisations have the same number of datasets, sometimes Liquid sorts those in a different order than the JavaScript does. There doesn't seem to be a way to sort on multiple keys in Liquid, so we may have to live with this for now (unless you've any ideas).
The only issue with this approach is that it groups by the organization
property of the datasets, rather than grouping by the list of organizations. The nice thing about this is it would omit organizations with 0 datasets, which makes sense for a filter list. The downside of this is that if two datasets spell the organization slightly differently (e.g. Water Department
vs Water department
), the liquid filter won't group them together. This isn't a problem if they're editing with the UI (as it's a drop-down), but if they're editing YAML files by hand, this could be an issue. Although, to be fair, I think this issue would have existed with the previous approach to grouping as well (it just wouldn't have listed both variations in the filter list).
EDIT: I think we could improve this further with group_by_exp
I see two issues remaining, both on the /datasets/ page
I tested by deleting the JavaScript from scripts/dist/ and copying over from the example datasets/orgs. Everything else tested worked fine for me.
Categories from the main page of JKAN don't work without javascript.
I still think it might make more sense to start with Luke's PR so we ensure things work without javascript and then can add optional javascript for things like search.
So I've been stewing on this one for a couple days, and I think there's a few pieces of this puzzle:
How we sort the list of organizations
and categories
by their dataset counts
How we generate the category pages
_data/categories.yml
in #207 (wrap it in an items
property). It wouldn't be a much bigger step to go all the way to a _categories
collection. The only thing I worry about here is whether it will conflict with Jekyll's built-in 'categories' concept (relating to the special posts collection).How we progressively enhance the datasets page
/categories
page linked to in the nav bar (next to /organizations
), could we just hide the filter bar when JS is disabled? How important is it to show the counts? Perhaps this gets it across the release line?If this is blocking the OpenDataPhilly work, then I propose we:
_data/categories.yml
into a _categories
collection and use Luke's categories.html
layout to render pages for each category (but using liquid filters to get the list of datasets for the category).If not, then I propose we:
datasets
property to the organizations
and categories
collections, which would make 1 trivial (e.g. site.organizations | sort: "datasets_count"
)<details>
element for a non-JS expandable list; Make 'Show more...' link to /organizations
//categories
)The generate_categories.rb script was from mysociety and then from before that it was based off of https://github.com/mnyrop/pagemaster this Jekyll plugin.
It sounds like a Jekyll plugin is the path you would ideally recommend going forward but might take a bit more time. If so, I'll see if we can invest some time in it. We have been blocked on other things atm.
If we add a /categories page linked to in the nav bar (next to /organizations), could we just hide the filter bar when JS is disabled? How important is it to show the counts? Perhaps this gets it across the release line?
That works for me - but still need something to generate category pages. I'm not sure the counts are super essential - but all solutions have them working at this time, right?
but still need something to generate category pages
I don't believe we do - we can just manually create a yml file for each category (instead of a single yml file with a list of categories, each category is its own file). The list of datasets for that category can be generated with liquid filters. Unless I'm missing something?
The plug-in I recommended would be for more easily computing the dataset counts and sorting the list of categories by those counts, but that's only necessary for the filter list. I can work on the plug-in this week, but again that shouldn't be necessary to have category pages.
Here's the plugin:
module DatasetCounts
class Generator < Jekyll::Generator
include Jekyll::Utils
def generate(site)
datasets = site.collections["datasets"].docs
datasets_by_org = datasets.group_by { |dataset| slugify(dataset.data["organization"]) }
site.collections["organizations"].docs.each do |organization|
org_slug = slugify(organization.data["title"])
org_datasets = datasets_by_org[org_slug]
organization.data["datasets"] = org_datasets
organization.data["datasets_size"] = org_datasets.size
end
datasets_by_category = datasets.each_with_object({}) do |dataset, memo|
categories = Array(dataset.data["category"])
categories.each do |category|
slug = slugify(category)
memo[slug] ||= []
memo[slug] << dataset
end
memo
end
site.data["categories"].each do |category|
category_slug = slugify(category["name"])
category_datasets = datasets_by_category[category_slug]
category["datasets"] = category_datasets
category["datasets_size"] = category_datasets.size
end
end
end
end
We can then generate the filter panels in liquid via:
{% assign sorted_categories = site.data.categories | sort: "name" | reverse | sort: "datasets_size" | reverse %}
{% assign sorted_orgs = site.organizations | sort: "title" | reverse | sort: "datasets_size" | reverse %}
(Note that the double sorting is necessary to get it to sort items with the same datasets_size
in alphabetical order, a consequence of liquid's sort
filter only working on one property at a time.)
We could then very easily access the datasets from layouts/organizations.html
or layouts/categories.html
via {% for dataset in datasets %}
, but again, this plugin isn't necessary for this part. We already access the relevant datasets in layouts/organizations.html
. For layouts/categories.html
, we'd need to do something like:
{% assign datasets = site.datasets | where: "category", name %}
The only advantage of using this plugin for that is that it would deal with typos better, because it converts organisations/categories to slugs before grouping them.
I had somehow not thought about just making the categories there on pages. That makes things much easier.
Is there any negatives I'm missing about doing the Jekyll plugin (aside from requiring Github Actions or Netlify builds)
Is there any negatives I'm missing about doing the Jekyll plugin (aside from requiring Github Actions or Netlify builds)
The only negatives I can think of are:
Gosh, liquid can be frustrating! I realised that it's no trouble at all to display dataset counts on the filter bar; the problem is just ordering that filter bar by those counts.
<h3>Categories</h3>
<div class="list-group" data-component="categories-filter" data-show="5">
{% for category in site.dataset_categories %}
{% assign count = site.datasets | where: "category", category.name | size %}
<a href="{{ site.baseurl }}{{ category.url }}" class="list-group-item">
<li class="d-flex justify-content-between align-items-center">
{{ category.name }}
<span class="badge badge-primary badge-pill">{{ count }}</span>
</li>
</a>
{% endfor %}
</div>
If there were a way to sort with CSS, we'd be done. The problem is we can't assign that count
to the category
object, so we can't then sort the list of categories by that count
property. The only way seems to be hacky approaches, or the plugin approach (which, again, I think is fine, but somehow my brain can't let go of this!).
Closed by #237, which includes cherry-picked commits from this PR. Thanks again @dracos for starting us on this direction, and awfully sorry it took so long to merge your commits!
Working on a JKAN-based site, I was disappointed to discover that the site required JavaScript to show anything under
/datasets/
and much of/organizations/
even though the pages consist of basic HTML, which is being generated by a static site generator that could therefore be generating actual HTML for all these pages. Yet the site is not using JavaScript to do things I would expect it to e.g. filtering the datasets page without a server-side request – each filter change causes a full page reload, which might as well then be including the requested data.Behaving this way makes the site slower (instead of just showing me the dataset or list, it shows an empty page that has to wait for the JS to download and perhaps fetch more data in order to show anything), it makes it less available - https://kryogenix.org/code/browser/why-availability/ - and it makes it less resilient - https://kryogenix.org/code/browser/everyonehasjs.html . In some cases, the HTML includes the content, it's just hidden in CSS unless JavaScript is available, which didn't make much sense to me.
This PR does the following:
/datasets/
page, in the same order the JavaScript will then display them in (even duplicating the 'Show more' functionality to prevent any flash of changed content, though I think I might only have done it on categories, not organizations);This covers most of the functionality I could see; the main one remaining would be that the category sidebar does nothing still without JavaScript, but at least lists the categories. Options there include hiding it unless JavaScript is present, or using a collection or plugin to generate per-category static pages that could be used. The combined category+organization filter I imagine could remain JS-only if there were no solution there.
This PR also fixes an if statement that meant categories were only output if the dataset had a note.