Closed jywarren closed 5 years ago
So, #5715 solves the co-occurence, but when should we re-count tallies? Whenever a tag is made?
We'd add tag.run_count
then...
Responding to the "heads up" from @jywarren. Jeff, thank you!! Two clarifying questions because I'm not quite sure what your heads up entailed.
When you wrote:
data = ["a", "b", "a", "b", "a", "c", "t"]
data.group_by{ |v| v }.map{ |k, v| [k, v.size] }
# => [["a", 3], ["b", 2], ["c", 1], ["t", 1]]
These seem like just counts. I would expect a co-occurence to look something like this (sorry I don't know the Ruby structure)
[[1, 6, 2], [3], [3, 1, 2]] -> {(1,2):2, (1,3):1, (1,6):1}
I guess that's just one question...
Yes that's right, the group_by command (obscurely) creates a count of incidence of each tag.
On Mon, May 13, 2019, 6:46 PM skilfullycurled notifications@github.com wrote:
I guess that's just one question...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/publiclab/plots2/issues/5714?email_source=notifications&email_token=AAAF6JZPMBBTD4AJG6KN7M3PVHVQ3A5CNFSM4HMROXF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVJYRTI#issuecomment-492013773, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAF6J5DB5UVUFWG6DAJ5V3PVHVQ3ANCNFSM4HMROXFQ .
Okay, I'm not clear how that handles the co-occurrence but feel free to leave the explanation for another time. I'm sure you have plenty of other things to do. Not urgent.
Its bc it's only fetching data for related tags to a single tag, not to all tags. So instead of returning all the groupings, it's returning just a list of tags that have co-occurred with the given tag. So, they are just counts of how many times they've co-occurred. Thanks!
I see. Awesome! Looking forward to seeing it play out and thanks for including me!
i'm eager too! As mentioned in the chatroom we've pushed back the big publication by one more day after some hours of debugging. But it should be live on https://stable.publiclab.org/tags now!!!
This is not live yet -- probably late this week!
I think we're prioritizing related tags on the
count
field, but don't actuallyhttps://github.com/publiclab/plots2/blob/fe2ffd6d71b9c5fc7c4ff09bb9d770ce30f4fe78/app/models/tag.rb#L362-L375
We have a
run_count
method here, but it's never called:https://github.com/publiclab/plots2/blob/fe2ffd6d71b9c5fc7c4ff09bb9d770ce30f4fe78/app/models/tag.rb#L36-L39
Let's do 2 things
run_count
by running it from time to time, but also:nids
in the first query above, which will allow us to sort them by how many times they co-occurred with the original tag, and fetch only the top 5.Also note: previously we were trying to collect the top 5 by overall high tag usage, but we were failing due to
run_count
not being updated. The result was thatrelated_tags
was just 5 randomly selected co-occurring tags, but not the 5 most commonly co-occurring.Now, we'll be measuring by how OFTEN they co-occur.
We should be able to run: