mediacloud / backend

Media Cloud is an open source, open data platform that allows researchers to answer quantitative questions about the content of online media.
http://www.mediacloud.org
GNU Affero General Public License v3.0
281 stars 87 forks source link

remove requested_items_count tracking #330

Open hroberts opened 6 years ago

hroberts commented 6 years ago

the values in requested_items_count are the same as the values in request_count in auth_user_request_daily_counts.

pypt commented 6 years ago

IIRC the original idea was that API endpoints should "make note" of the requested item count as is done here:

https://github.com/berkmancenter/mediacloud/blob/master/lib/MediaWords/Controller/Search.pm#L315

Then the ::ActionRole::Logged will log said count to the database.

Not much code besides the ::Controller::Search calls set_requested_items_count(), so we should either make the API endpoints call it or scrap the "requested items" limitation altogether.

pypt commented 6 years ago

The initial idea with the requests / requested items count was that we would be logging those separately and having separate caps on both.

For example, if user makes a single stories/list?limit=10 call to return 10 stories, she uses up 1 request and 10 requested items from her quota.

Request counting is done automatically because it's easy to do, one doesn't have to add any extra code when implementing a new API call to count + throttle the requests themselves. Requested items is a slightly different story because the developer then has to report back how many items just got returned back to the user and should be subtracted from her quota:

package Stories;

sub list($)
{
    my $limit = shift;

    # Fetch the stories here
    my $stories = ...;

    # Report back the story count that was returned back to the user
    MediaWords::ActionRole::Logged::set_requested_items_count( $c, scalar(@{ $stories }) );

    return $stories;
}

So far we've bothered counting the requested items at a single place only, which is /search page from the old web UI. So, given that no throttling has been done on the requested items count whatsoever, and no one has abused our API (it seems), I'd say we can scrap the "requested items" counting + throttling altogether.

hroberts commented 6 years ago

​I think this is the right solution, in combination with making sure that our limits for returned items on individual calls are pretty small. that will still provide us some protection but in a simpler (and more industry standard) way.

I'll add a separate github issue to review return limits, which would be a good exercise on principle in any case.​

On Tue, Jun 12, 2018 at 2:08 PM Linas Valiukas notifications@github.com wrote:

The initial idea with the requests / requested items count was that we would be logging those separately and having separate caps on both.

For example, if user makes a single stories/list?limit=10 call to return 10 stories, she uses up 1 request and 10 requested items from her quota.

Request counting is done automatically because it's easy to do, one doesn't have to add any extra code when implementing a new API call to count + throttle the requests themselves. Requested items is a slightly different story because the developer then has to report back how many items just got returned back to the user and should be subtracted from her quota:

package Stories; sub list($) { my $limit = shift;

# Fetch the stories here
my $stories = ...;

# Report back the story count that was returned back to the user
MediaWords::ActionRole::Logged::set_requested_items_count( $c, scalar(@{ $stories }) );

return $stories;

}

So far we've bothered counting the requested items at a single place only, which is /search page from the old web UI. So, given that no throttling has been done on the requested items count whatsoever, and no one has abused our API (it seems), I'd say we can scrap the "requested items" counting + throttling altogether.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/berkmancenter/mediacloud/issues/330#issuecomment-396700413, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvvT-DCOoZr9Fz7gKhuaT67ggrX766gks5t8BGmgaJpZM4Sn-ch .

-- Hal Roberts Fellow Berkman Klein Center for Internet & Society Harvard University

rahulbot commented 6 years ago

Decided to remove it - code cleanup task, not high priority