Consider how tag cloud works now we have more tags

conatus commented 2 years ago

From an email from @colombinary

One other thing we noticed is the current appearance of the tag cloud on the homepage. With the increasing amount of tags it looks like a long list rather than a cloud. There is a bug item in the Medium list about the tag cloud (#58), but not sure if that includes this issue? It would be great to know what the behavior or algorithm is that the tag-cloud follows as new tags get entered so we can think about this further. I also CC'd Gemma as she may have a suggestion for this.

Moved it over to here to discuss it further.

conatus commented 2 years ago

cc @GemCopeland

We can see the behaviour in code here - https://github.com/planetarypraxis/smartforests/blob/main/smartforests/static/js/tag_cloud.js#L150-L300

The code notes how this all works.

    // Use webcola's constraint-based graph layout plugin for d3 to lay out the tags, ensuring that we respect the
    // following constrants:
    //
    // * Related tags are close togehter
    // * Tags are all within the bounds of the tag area.
    // * Tags do not overlap.

In short then it works by making related tags close together. Right now, I would suspect that the posts are quite close together in terms of the "span" - tags are quite closely related. So therefore it shows as a long list, there is minimal differnce so this is shown visually.

If you add in things that are aggressively distinct (e.g. add a Logbook that has tags shared by no other Logbook) then (presuming the cloud is working) it will appear far way - it could even be on the extreme right or left hand side of the cloud.

So seems there are two ways through this:

Accept this is actually the reality of how the data is being mapped and just keep going, knowing that as the Atlas expands it will be more "cloud like". This could be augmented by adding some explaination text explaining why things appear as a list.
Make the tags further apart, perhaps making their distance apart some derivate of the number of tags in total. This will exagerate the differences between tags as laid out on the page and look a bit more like a cloud, although in a sense it is falsifying the reality of the situation.

To me it feels like 2 will create problems down the line as there is genuinely a wider display of tags and a richer quantity of data. Deriving distance between tags from their number alone doesn't account for outliers and we could get the opposite problem, or even push things off the canvas entirely. Perhaps there can be some other calculation we can make.

We can probably try an experiment with 2 to see how it would look, but this needs to balanced in terms of priority with other issues as this all could get quite complex.

colombinary commented 2 years ago

Thanks, this is very helpful to understand tag behaviour and seems very fitting to what we want with the tag cloud (option 1). From this understanding it seems like the issue is more of a bug. I've added a test story with unrelated tags (four different fruit types) and they do not show up. I then also tried zooming in and out on the homepage and it seems like when doing this repeatedly the related tag clouds actually start to form more distinct somehow? I've added some screenshots, also note the white dots appearing in the corner, I'm guessing they are the missing tags. Screenshot "7" seems to get closest to what we are trying to achieve I think, in this appearance of the tag-cloud it becomes apparent that the two different clouds generally relate to the two different areas of entries both me and Danilo have been adding over the last weeks, so that makes sense with what the tag cloud code is doing. I hope this helps.

colombinary commented 2 years ago

In terms of priority, I think this could fit in the Medium list to be looked at over the next few weeks. To make space for this if needed I think issue #94 (mobile appearance) could be moved to lower priority.

GemCopeland commented 2 years ago

Hey @conatus, a few comments after playing with this for a while:

I don't understand the code (although I did try!) but from looking at it visually, it really seems like the points are distributed really evenly along the y-axis, with a couple slightly horizontally offset. This doesn't seem like the normal outcome for a scatterplot, even one with a really small dataset.

I'm not sure what to suggest here, except maybe that you and whizkid @janbaykara take a look at it and see if there's any variables that could be tweaked or haven't been added correctly?

~It basically feels like the tags are being constrained to a narrow vertical window. Could this have something to do with the mobile breakpoint perhaps?~

Edit: I just spoke to my in-house developer (lol) and he says that this is happening because d3 is expecting coordinates, but what is being input is a single value (score), which means the tags are just being plotted as a line. Since the scores are linear, he suggests running them through an algorithm that converts them into coordinates (somehow). What generates the score? Perhaps this could be replaced with something that generates a vector instead of a score.

Can you explain a bit more how the tags are related? My assumption was that if a logbook was posted with seed and indigenous participation (as Universe of Seeds is) that these tags would be close to each other, but they're not. indigenous participation is next to multispecies instead, but I can't find a post where these two appear together.
At the moment, it looks like the colour of the map just comes from an inner and outer gradient. We had previously discussed it being more like a density plot where the colour changes according to the frequency of the tags.

conatus commented 2 years ago

Hey @GemCopeland - not to leave you hanging.

Will have a chat and work out what is best to do here.

@colombinary

Thanks for the priority, very useful.

conatus commented 2 years ago

@colombinary

Moved to the priority location as per your request.

janbaykara commented 2 years ago

I'm messing around with the code and finding some give in the settings, so I think we'll be able to get something going. Something to do with the jaccardLinkLengths setting.

Strikes me that the background colouration is currently quite confusing, not very communicative. I'm wondering if we can:

Bring more contrast to the background by expressing some sense of 'weight' of node as a dark/light intensity zone.
Perhaps actually display the links between the nodes?

Will continue experimenting.

conatus commented 2 years ago

In terms of distribution @janbaykara that, I think, looks much more like what @colombinary is talking about.

Great to see the screenshot too.

colombinary commented 2 years ago

This looks great! Fully agree Jan, with the background contrast and the linking nodes.

janbaykara commented 2 years ago

(Ugly prototype warning): Yes this is starting to communicate something.

colombinary commented 2 years ago

This looks fascinating! But now seeing the nodes I am afraid it draws more attention to ambiguous connections. For example, the line between 'a radically new test tag' and 'cicada' makes a connection that is not reflected in the tags themselves (as they are not appearing in similar entries). Also, the tags 'multispecies', 'cicada', and 'brood X', all appear in the same logbook but are not connected in this visual representation. When the nodes are invisible these potential issues are less likely to draw attention.

I've just raised this with other project members over email, I'll continue to report here but just wanted to flag this asap. :)

janbaykara commented 2 years ago

Just FYI @colombinary, I am running this on my local machine with an adjusted copy of the data, not the live website data.

What I'm now exploring is the possibility of using a heatmap approach.

Early proof of concept. This does less to emphasise the specific pathways of connection but might emphasise 'weight' of a tag a bit better. I think we need some kind of multiplication effect to emphasise clustering between them though, so that the valleys and peaks come out. And of course, it's very low resolution / pixelly!

colombinary commented 2 years ago

Thanks for clarifying @janbaykara.

Just discussed this with @JGabrys too and remember from our earlier discussion that we definitely want to avoid lines between tags (this is because of a certain digital methods aesthetics we really wish to avoid).

We'd like for the homepage to be atmospheric and suggestive.

janbaykara commented 2 years ago

Cool, that's a useful pointer, thanks. From an engineering perspective, it was useful to quickly visualise the links to describe the kind of clustering that is going on, and to experiment with approaches to heatmapping.

janbaykara commented 2 years ago

Going to leave it for the weekend with this effect, which I'm kind of happy with:

Probably the colours could be changed, and there could be more internal detail within the southern cluster, and other tweaks things like that, but at least now we have some minor sense of terrain.

Note the texturisation (pixelisation) which I think is a good improvement and calls back to the Figma specs:

How this is done

Multiple, overlayed grid heatmaps, going from several hundred pixels to only a few pixels wide:
Links overlaid and blurred to produce a 'terrain':
With the blurred image re-pixellated to bring out the 'digital' medium vibe:

colombinary commented 2 years ago

This looks great Jan, thanks for your work on the homepage. @JGabrys also agrees it looks good and works well with the pixellated background.

Concerning the colour, we think a less white background would help make the tags more readable.

One thing we are wondering about is what is the actual interpretation of the 'heat' in this code? If the lighter shade is the 'hottest' are those the most used/populated tags? It would be very helpful for us to understand the relative logic of this arrangement. (this also in relation to the discussion on randomisation of how tags appear in issue #67)

janbaykara commented 2 years ago

I agree with you about the white background! I've reduced the 'hottest' colour down to a paler green.

The 'heat' (score) for a tag is calculated by counting many times other tags appear in pages that include this tag, and also further afield neighbouring. The further afield the neighbouring tags are (degrees of separation by page), the lower the score modifier. The different clouds for the different tags are combined on the homepage, so that each tag's score is an aggregate of all the clouds' scores for that tag — a tag is 'seen' by its relationship to the other tags. This is a bit complicated to say but, suffice to say, it's a measure of 'centrality' of a tag amongst the page content.

(I think I have this right @chrisdevereux but feel free to put this another way if I sound like I'm talking gobbledeegook.)

Re #67

Re #67, to clarify - the score of a tag isn't dependent on how many times it's viewed or clicked. It's based on how editors use tags in the content.
To address the fact that some tags don't show up, we've widened the net to include many more tags, essentially. We think this will do the trick for now. When we know more about the data, we can change it up again.

Here's the latest version

With different shades to pick from:

GemCopeland commented 2 years ago

Thank you so much for your work on this @janbaykara! It's looking so so much better. Of the two options above, I prefer the second one.

janbaykara commented 2 years ago

Cheers Gemma! I've submitted this for inclusion on the live website at #129 as it sounds like this is uncontroversially better. After it's deployed, there'll be a chance to see this live in the wild.

colombinary commented 2 years ago

Awesome! Looking forward to seeing this live :)

colombinary commented 2 years ago

I'm not sure if this update is live yet? I'm not seeing the pixelated effect. We can also discuss this more in the meeting on the 9th to streamline.

In case this is updated, I do see an issue with tags that are removed from posts but they still seem to show up on the homepage currently. At some point I added an entry with a bunch of fruits as tags. I removed that entry a while ago but the tags still appear on the homepage. I think the same thing is happening with some of the test-tags, although some of them may be added from the backend. Not sure if this is already know, just adding a note here to be sure :).

janbaykara commented 2 years ago

@colombinary I think you might need to hard refresh the page! It is live and I'm seeing it.

I am seeing this issue of old tags showing up. I will create a new issue about this. Thanks for reporting!

colombinary commented 2 years ago

We have an administrative question regarding the tags: is there any place in the code where we can see the tags collected all together? If yes, could you point us towards where we can find it.

We would like to occasionally export the tags of the Atlas (copy paste them as a list or put them in a .csv) to use them with other kinds of research on the project. So far we found the filter panel gives an overview, but this panel is separated per wayfinding device so it doesn't show all the tags.

JGabrys commented 2 years ago

Thanks for this, Michelle, and just to add we were also wondering if it is possible to edit the SF Atlas tags (for instance, if we have acoustic and Acoustics as we do now, can we edit and merge these in any way so it propagates across the site?).

janbaykara commented 2 years ago

@colombinary yes this is doable.

You can find all the tags via https://atlas.smartforests.net/admin/snippets/smartforests/tag/
You can also navigate to this from the sidebar in the admin section of the website: click snippets, then tags.
There's no single button click approach to CSV downloads, but in the future this functionality could be added.

@JGabrys it is possible to edit the name of a tag and to delete tags (via the link above), but there is currently no merge functionality. In the future we could add functionality for merging tags.

janbaykara commented 2 years ago

@colombinary I have set up a Google Sheet which pulls in the tags (and page counts) so you can copy / paste / export to CSV easily. I hope this works as an interim solution!

https://docs.google.com/spreadsheets/d/1PCsOh00RO1N_9H5GeLjoY4nzjJnHS8rmVAZIw0pEKUY/edit?usp=sharing

For team documentation, this is running through a free tier syncing solution, via my jan@commonknowledge login: https://app.hightouch.io/smart-forests-atlas/syncs/29714

colombinary commented 2 years ago

This is great, thanks for setting this up. I am seeing that on the 'snippets' page, many of the test-tags are still on this list, while they are already cleaned up from the tag cloud and Atlas content pages. Does this mean that any deleted tags from the Atlas are still stored in this list?

janbaykara commented 2 years ago

Tags, when removed from a page, continue to exist in the database as independent entities. But they generally will only show up when relevant; say, in tag filter panels if there are pages with this tag, or in the tag cloud if there are pages with this tag. So in practice this isn't a problem. They will continue to be recommended when you type into the tag input box though. You are free to delete old tags you no longer want in the system, and you can use the Google Sheet's "page count" column to identify the ones that you can safely delete.

colombinary commented 2 years ago

Thanks for clarifying, this is very helpful!

planetarypraxis / smartforests