theCrag / website

theCrag.com: Add your voice and help guide the development of the world's largest collaborative rock climbing & bouldering platform
https://www.thecrag.com/
110 stars 8 forks source link

Show subscriber count in forum page #3312

Open brendanheywood opened 5 years ago

brendanheywood commented 5 years ago

What happened?

For any forum page, either node based or not like the release forum, show a subscriber count.

This also pages the way for some sort of self policing / warning, eg if you are about to post to a forum with a very large number of subscribers but you are a new user then it could either prevent it until you get more karma, or perhaps just guide to you to post in a more appropriate focused forum eg further down like a crag instead of a country.

What you expected:

DaneEvans commented 5 years ago

That’d be good for unintended spam, but what will it do for intentional spammers?

Knowing that 20 000 people will see something might have some unintended consequences.

On 20 Nov 2018, at 11:07 am, Brendan Heywood notifications@github.com wrote:

What happened?

For any forum page, either node based or not like the release forum, show a subscriber count.

This also pages the way for some sort of self policing / warning, eg if you are about to post to a forum with a very large number of subscribers but you are a new user then it could either prevent it until you get more karma, or perhaps just guide to you to post in a more appropriate focused forum eg further down like a crag instead of a country.

What you expected:

subscriber count in release forums subscriber count in node forums should we include the full number including private accounts, but it's just a number not a list so this should be fine — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

brendanheywood commented 5 years ago

Yes we need to tread carefully here, but hopefully it will help with reducing real posts but which are 'relative' spam, ie we get a lot of people posting "wanted partner for crag X" at the world level. We need to better guide people to where they should be posting.

We also intend to make posting to larger forums slightly harder to mitigate spam, eg add a minimum karma, this has been in our plans for a while here but I've just broken it out into this issue:

https://github.com/theCrag/website/issues/1365

Of course for real serious spam from fake accounts we just need a flag button, but we still tend to find these pretty quickly anyway:

https://github.com/theCrag/website/issues/2465

Also I think I'll only show this to people who are logged in

scd commented 5 years ago

This is done, but maybe leave open for performance assessment.

screenshot from 2019-02-05 18-19-52

scd commented 5 years ago

Note that the numbers are artificially large. This is either a mistake in the sql or it is an artifact of the way we use to do things. Maybe this we should implement the 'account culling' of non active accounts before we release these stats.

brendanheywood commented 5 years ago

Yeah those don't feel right. I suspect the 17k is the number of favorites, not the number of unique users. If I've favorited 10 crags inside aus am I being counted 10 times? Assuming thats the case and you fix that, then the main number on the left will not be the sum of the 3 on the right, as for instance I could watch the world, aus, and 10 crags inside it.

scd commented 5 years ago

It is a distinct count.

On Wed, Feb 6, 2019, 21:44 Brendan Heywood notifications@github.com wrote:

Yeah those don't feel right. I suspect the 17k is the number of favorites, not the number of unique users. If I've favorited 10 crags inside aus am I being counted 10 times? Assuming thats the case and you fix that, then the main number on the left will not be the sum of the 3 on the right, as for instance I could watch the world, aus, and 10 crags inside it.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/theCrag/website/issues/3312#issuecomment-460978516, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWxQjHu4JyoShSXfD65voEEg7mPzL2Qks5vKrIrgaJpZM4YqIJu .

scd commented 5 years ago

As a high level sql verification I have written this

SELECT
  COUNT(DISTINCT A.ID) AS C
FROM
  Account AS A
JOIN
  MessageGroupSubscription AS S ON
        S.RecordStatus = 1
    AND S.Account = A.ID

I get 50380. If I remove distinct count then I get 177026.

So the high numbers are more subtle. Either I have a misunderstanding of how I implemented the data model or we were far more effective at auto subscribing people in past years when they did something in an area. There was some time with Adam where new users where auto subscribed to an area in order to demonstrate the feature to them, but this would not explain the numbers I am seeing.

One thing that looks suspicious is that direct subscriptions to country looks high. If we auto subscribed people to their country when they signed up then this would explain the numbers.

scd commented 5 years ago

There are no current auto subscribes when signing up.

scd commented 5 years ago

I have worked out why there are large numbers.

Up until a few years ago you were subscribed to a forum if:

More recently we changed this to be

The large number issue is to do with the number of favorited areas. There are 38k accounts with favorited areas.

mysql> select count(DISTINCT Account) from AccountNodeShortcut;
+-------------------------+
| count(DISTINCT Account) |
+-------------------------+
|                   38875 |
+-------------------------+
1 row in set (0.15 sec)

I think this is a historical issue where we use to automatically favorite areas when they created an account.

What to do about it? We can remove Subscriptions to accounts which have been inactive for more than 2 years?

Note that there is also a logic issue which we may want to work though. When the subscription count says something like: "3,027 subscribers ( 1,006 direct, 441 ancestor, 1,580 descendant )" the direct, ancestor and descendant counts are distinct accounts, however the combination of the three may not be. Is this significant enough to work through. I am not sure of performance issues with doing a union on the three queries for a distinct account.

scd commented 5 years ago

I bit the bullet and tested fixing the logic error which seems performant enough. However the discrepancy it fixes is really small.

For example in Yosemite:

3,027 subscribers ( 1,006 direct, 441 ancestor, 1,580 descendant )

Is corrected to

3,010 subscribers ( 1,006 direct, 441 ancestor, 1,580 descendant ) 
scd commented 5 years ago

@brendanheywood we have got to decide what to do with those historical auto favorites.

  1. Do we leave as is and release with artificially high subscriber numbers?

  2. Do we hold off with releasing this enhancement?

  3. Do we run a data cleanse script to unsubscibe people according to an agreed pattern?

My preferred option is a data cleanse script but we should be aware that undoing will be very difficult and we may get a few false positives. My guess is the number of subscribers will go down dramatically (probably end up with less than 50% of the numbers that are there now).

scd commented 5 years ago

One set of auto subscriptions were to the following nodes:

# 99929625      https://www.thecrag.com/99929625        Calabogie       5378
# 11811487      https://www.thecrag.com/11811487        Springwood Conservation Park    5370
# 119929479     https://www.thecrag.com/119929479       Hatun Machay    5346

After removing favorites and subscriptions to those nodes if the account has not ticked in the area the overall subscriptions changes from

39,047 subscribers ( 31 direct, 0 ancestor, 39,037 descendant ) 

to

34,275 subscribers ( 31 direct, 0 ancestor, 34,265 descendant ) 

So this sorts out about 5k subscriptions.

But I think the problem may be deeper still and go back to pre 2010. So areas ordered by favorites is

11747155    https://www.thecrag.com/11747155    France  1681
11756239    https://www.thecrag.com/11756239    California  1004
11747371    https://www.thecrag.com/11747371    Italy   968
11756287    https://www.thecrag.com/11756287    Colorado    752
11756839    https://www.thecrag.com/11756839    New York    747
11756935    https://www.thecrag.com/11756935    Pennsylvania    735
11811439    https://www.thecrag.com/11811439    Kangaroo Point  700
11756431    https://www.thecrag.com/11756431    Utah    664
11738011    https://www.thecrag.com/11738011    Mexico  648
11746435    https://www.thecrag.com/11746435    Thailand    592
11756599    https://www.thecrag.com/11756599    North Carolina  592
11740915    https://www.thecrag.com/11740915    Arapiles    574
11756479    https://www.thecrag.com/11756479    Virginia    556
11741347    https://www.thecrag.com/11741347    Blue Mountains  523
11756551    https://www.thecrag.com/11756551    Tennessee   509
11756311    https://www.thecrag.com/11756311    Washington  505

So I am thinking that we also auto subscribed to countries and states in US.