Tag counts aren't updating - and reworking Related Tags query

jywarren commented 5 years ago

I think we're prioritizing related tags on the count field, but don't actually

https://github.com/publiclab/plots2/blob/fe2ffd6d71b9c5fc7c4ff09bb9d770ce30f4fe78/app/models/tag.rb#L362-L375

We have a run_count method here, but it's never called:

https://github.com/publiclab/plots2/blob/fe2ffd6d71b9c5fc7c4ff09bb9d770ce30f4fe78/app/models/tag.rb#L36-L39

Let's do 2 things

[x] fix run_count by running it from time to time, but also:
[x] tally/histogram nids in the first query above, which will allow us to sort them by how many times they co-occurred with the original tag, and fetch only the top 5.

Also note: previously we were trying to collect the top 5 by overall high tag usage, but we were failing due to run_count not being updated. The result was that related_tags was just 5 randomly selected co-occurring tags, but not the 5 most commonly co-occurring.

Now, we'll be measuring by how OFTEN they co-occur.

We should be able to run:

data = ["a", "b", "a", "b", "a", "c", "t"]
data.group_by{ |v| v }.map{ |k, v| [k, v.size] }
# => [["a", 3], ["b", 2], ["c", 1], ["t", 1]]

jywarren commented 5 years ago

So, #5715 solves the co-occurence, but when should we re-count tallies? Whenever a tag is made?

jywarren commented 5 years ago

We'd add tag.run_count then...

skilfullycurled commented 5 years ago

Responding to the "heads up" from @jywarren. Jeff, thank you!! Two clarifying questions because I'm not quite sure what your heads up entailed.

When you wrote:

data = ["a", "b", "a", "b", "a", "c", "t"]
data.group_by{ |v| v }.map{ |k, v| [k, v.size] }
# => [["a", 3], ["b", 2], ["c", 1], ["t", 1]]

These seem like just counts. I would expect a co-occurence to look something like this (sorry I don't know the Ruby structure)

[[1, 6, 2], [3], [3, 1, 2]] -> {(1,2):2, (1,3):1, (1,6):1}

skilfullycurled commented 5 years ago

I guess that's just one question...

jywarren commented 5 years ago

Yes that's right, the group_by command (obscurely) creates a count of incidence of each tag.

On Mon, May 13, 2019, 6:46 PM skilfullycurled notifications@github.com wrote:

I guess that's just one question...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/publiclab/plots2/issues/5714?email_source=notifications&email_token=AAAF6JZPMBBTD4AJG6KN7M3PVHVQ3A5CNFSM4HMROXF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVJYRTI#issuecomment-492013773, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAF6J5DB5UVUFWG6DAJ5V3PVHVQ3ANCNFSM4HMROXFQ .

skilfullycurled commented 5 years ago

Okay, I'm not clear how that handles the co-occurrence but feel free to leave the explanation for another time. I'm sure you have plenty of other things to do. Not urgent.

jywarren commented 5 years ago

Its bc it's only fetching data for related tags to a single tag, not to all tags. So instead of returning all the groupings, it's returning just a list of tags that have co-occurred with the given tag. So, they are just counts of how many times they've co-occurred. Thanks!

skilfullycurled commented 5 years ago

I see. Awesome! Looking forward to seeing it play out and thanks for including me!

jywarren commented 5 years ago

i'm eager too! As mentioned in the chatroom we've pushed back the big publication by one more day after some hours of debugging. But it should be live on https://stable.publiclab.org/tags now!!!

jywarren commented 5 years ago

This is not live yet -- probably late this week!

publiclab / plots2

Tag counts aren't updating - and reworking Related Tags query #5714