soscripted / sox

Stack Overflow Extras: a userscript for the Stack Exchange websites to add a bunch of optional toggle-able features
http://stackapps.com/q/6091/
MIT License
72 stars 15 forks source link

Better way to detect employees using less bandwidth/fewer API requests #341

Closed GaurangTandon closed 5 years ago

GaurangTandon commented 6 years ago

Installed Version: 2.2.3DEV Environment: Tampermonkey

Current Behaviour

image

Steps to reproduce

Visit this page - https://meta.stackexchange.com/questions/312038/wed-like-your-feedback-on-our-new-code-of-conduct and see comments by profile Donna

Note: When I run the API on Donna's profile, the is_employee property is set to true. So, I am not at all sure what the issue actually is :/ I reviewed the code under markEmployees, but it looks okay and there shouldn't honestly be an issue. And yet there is...


Features Enabled

["Appearance-colorAnswerer","Appearance-isQuestionHot","Appearance-markEmployees","Appearance-metaChatBlogStackExchangeButton","Appearance-metaNewQuestionAlert","Appearance-tabularReviewerStats","Appearance-addTimelineAndRevisionLinks","Comments-autoShowCommentImages","Comments-confirmNavigateAway","Editing-addSBSBtn","Editing-inlineEditorEverywhere","Flags-flagPercentages","Flags-flagPercentageBar","Sidebar-hideHireMe","Sidebar-hideChatSidebar","Sidebar-hideLoveThisSite","Sidebar-hideHotNetworkQuestions","Chat-replyToOwnChatMessages","Extras-showMetaReviewCount","Extras-copyCode","Extras-dailyReviewBar","Extras-showQuestionStateInSuggestedEditReviewQueue"]
shu8 commented 6 years ago

@GaurangTandon Reproduced; I'll look into it, thanks!

shu8 commented 6 years ago

@GaurangTandon temporarily fixed in dev 2.2.11.

It was because the API only returns 30 items by default but that page had 59 unique users. I've changed the size to 100 now, but the function could probably do with making sure if the number of IDs is greater than 100 then splitting them into multiple requests, but I'll do that another time!

GaurangTandon commented 6 years ago

A few remarks:

  1. Please use filters! You are only using the is_employee and user_id properties under the API call. You need a single apple but you're fetching the whole fruit shop! I'll submit a pull request within a few hours to use filters, not just here but in all other API calls.

  2. Could you cache the IDs with GM_setValue? Currently the API fetches data associated with these IDs every time I load any SE webpage. While I am on vacations, I open SE webpages at least hundred times a day (not exaggerating). You could rather cache the user_ids of employees and update this list once everyday. The cache would be small as there are relatively fewer employees.

  3. You can easily shorten the number of IDs you pass to by checking if a user has the diamond next to their name. All employees have a diamond next to their name on each site they participate. (for example, Donna had the diamonds on MSE and SO, but not GD.SE where they didn't participate)

I'll submit a PR asap.

shu8 commented 6 years ago

@GaurangTandon You're right about the filters, I should have used them from the beginning :/

We could possibly cache them, the only problem I can see is that they each have a different user ID on different sites, but I guess you'd only really be seeing them on a few main ones?

About the diamonds, some employees don't necessarily have diamonds, so I don't think we should do that -- see e.g. https://meta.stackexchange.com/questions/216414/are-there-moderators-without-a-diamond#comment700653_216416, or https://islam.meta.stackexchange.com/a/1003.

GaurangTandon commented 6 years ago

About the diamonds, some employees don't necessarily have diamonds

...well, they don't have their is_employee set to true either. Here's Laura for example. But I would rather that Laura has retired since then. Her last activity on Meta.SE was in 2017. Let me ask Taverners if there are currently working devs without a diamond...

Till then keep the PR on pause.

j-f1 commented 6 years ago

@GaurangTandon Don’t forget that there are employees who aren’t devs. SO also employs salespeople, designers, community managers, data scientists, and probably other people too.

GaurangTandon commented 6 years ago

@j-f1 That's right, being a dev myself I was biased towards devs :P

@shu8

some employees don't necessarily have diamonds,

If we query users/moderators, we can get the list of employees on that site. We can run this query once a week and cache the results.

We could possibly cache them, the only problem I can see is that they each have a different user ID on different sites, but I guess you'd only really be seeing them on a few main ones?

Yah, right, we should use localStorage for that as it would give us SE site specific results.

I'll do a PR tomorrow it's super late here!

GaurangTandon commented 6 years ago

@shu8 The script is ready. You can read the full details on the relevant StackApps question page. Now, the thing here is that, only 75% of the work is complete. The script can fetch the user_ids for us. Brock mentioned that we need to setup a chron job to automate that task weekly, and store its results in a public JSON file. Once that is done, a public JSON file of the format [userID1, userID2, userID3, ...] would be ready.

I instead think it'd be better to run this task from the client weekly, given that SOX has its own access token already, and store the associated accounts_ids into the tempermonkey storage. Then, I can write code that would utilize this cached file, requesting more user_ids from the account_ids for individual SE sites as necessary.

I'll work this out in a day or two.

shu8 commented 6 years ago

@GaurangTandon nice! that google apps script is pretty neat!

GaurangTandon commented 6 years ago

@shu8 I've hit an obstacle. Namely, we cannot run the weekly checks looking for userIDs from the client side, because a) the API method takes more than twenty minutes to execute. That's way too long ttwenty minutes for every person who installs SOX. b) scraping the Google Apps Script page requires the client to set Access-Control-Allow-Origin: *, using extensions like this one. We can't expect users to have such extensions installed, nor can we force them to do so.

The only solution left to us, seems to be:

  1. setup a public Github file and make the client GET it weekly via the GitHub API.
  2. setup the script to run weekly updates on the public GitHub file via a cron job - from the server side

Now, when I say "server side", I mean a PC/mac that belongs to one of SOX core contributors. So, are you willing to run this job on your PC @shu8?

Again, this may seem like an "overkill" (specially since the current method still works), but I think that if the "overkill" favors extremely reduced bandwidth consumption, we should certainly do it. While I certainly love the employee marker, I dislike shelling out so much bandwidth for it every time I load the SE page :(

Thoughts?

shu8 commented 6 years ago

@GaurangTandon sorry, I don't think I'd be able to have it on my computer, it's often shared and in a few months I'll be at uni!

I appreciate you want to save bandwidth, so how about we create a local store of employee IDs over time? So we can keep it like you've currently changed it to, and then with every employee, add them to a local GM Value/localStorage variable and don't add the id to the ids array if the ID already exists in the saved array. Although it will still use bandwidth, it will be much less over time, especially if you frequent the same few Meta sites. What do you think?

GaurangTandon commented 6 years ago

it's often shared and in a few months I'll be at uni!

@shu8 Ah, same here. Except that I'm leaving for uni this month :O

Anyway, I don't exactly understand what you're trying to propose here. What do you mean by a "a local store of employee IDs over time?"

shu8 commented 6 years ago

@GaurangTandon I was thinking of storing employee user IDs as they are found (using the current method of sending all user IDs to the API) and so we wouldn't need to fetch for known employee IDs in the future for a given client

However, now that I think about it, it probably won't save many requests because there are more non employees than employees 😕

GaurangTandon commented 6 years ago

I guess we're stuck at the existing method then? I'll admit it's still very expensive.

Well, then, what else can be done :(

shu8 commented 6 years ago

@GaurangTandon hmm, I just thought about that YQL thing you mentioned -- we could use that to access the google apps script?

GaurangTandon commented 6 years ago

I guess that would work...I've never used YQL though. Wanna give it a shot? See if you can scrape the GA scripts page without requiring to disabling CORS.

shu8 commented 6 years ago

sure, I'll try it soon!

On Tue, 10 Jul 2018 at 14:28, Gaurang Tandon notifications@github.com wrote:

I guess that would work...I've never used YQL though. Wanna give it a shot? See if you can scrape the GA scripts page without requiring to disabling CORS.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/soscripted/sox/issues/341#issuecomment-403821938, or mute the thread https://github.com/notifications/unsubscribe-auth/AIcNjgaxOZpPPvyk6V1tnLCOZzETQB1mks5uFKwSgaJpZM4VCka0 .

j-f1 commented 6 years ago

There are also things like crossorigin.me which will allow you to fetch resources from any URL (the only caveat being that they don’t get session cookies).

shu8 commented 6 years ago

@j-f1 Cool, I never knew that existed! Thanks, I'll look into it!

shu8 commented 6 years ago

I couldn't get it to work for yahoo, I think they've stopped supporting extracting the HTML because I get an error that 'html table is not supported'.

crossorigin.me didn't work either, and I think that may be because it has a 2MB limit? But the error I get is a 522 no-referrer-when-downgrade :/ (instead of a 413 like the website says)

I found https://cors-anywhere.herokuapp.com/ and it works with that: https://jsfiddle.net/shub01/ohd86uby/2/

I'm going to go back to the features that use yahoo's YQL and see if they still work!

shu8 commented 5 years ago

Closing this in favour of the API caching that was implemented as part of #391; this will reduce the number of API requests made after initially performing them as they will be locally cached for a short period of time.