internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.22k stars 1.37k forks source link

Infogami API Endpoints Not Enforcing Sensible Result Limits #314

Closed ParthKolekar closed 4 years ago

ParthKolekar commented 8 years ago

There is no limit on the limit argument that is sent to the history.json?limit=X Hence it is possible me to run something like this on the server, effectively crashing it.

https://openlibrary.org/subjects/history.json?limit=10000&has_fulltext=false&sort=editions

As a proof of concept, I tried it and the web page did indeed go down for some time.

Please set pagination limit enforced on the limit parameter.

DefUs3r commented 6 years ago

@ParthKolekar @mekarpeles Hello sirs, I am a undergrad student of IIIT Hyderabad. I would like to work on this issue. I came across this issue when I was reading project ideas of Internet Archive for GSoC 2018. It would be a great help if the work is explained to me in more detail with reference to this issue and how it would augment the archive because as of now this issue is bit hazy to me. Thanks and regards. Soumyasis Gun

sbshah97 commented 6 years ago

Hi @riddhyasona you could look to solve this Issue if you're interested! Do take a look at our README and CONTRBUTING guide for more information on how to use RECAPTCHA v2 on the Dev Environment!

ParthKolekar commented 6 years ago

Hey @riddhyasona, I'd start with emulating this under a non-production environment and seeing if the website still explodes or not.

And when it explodes, you can then post-mortem and see how it explodes.

MineRobber9000 commented 6 years ago

I mean, it seems like this is a no-brainer: just limit how much history can be requested (an upper bound on limit=).

It's probably exploding due to tons of memory usage to prepare the giant 10,000 item history list you're asking for.

mekarpeles commented 6 years ago

@MineRobber9000 pull requests and questions very welcome :)

MineRobber9000 commented 6 years ago

where is the history endpoint in code? i'll take a look when I get home?

mekarpeles commented 6 years ago

@MineRobber9000 https://dev.openlibrary.org/developers/routes Technically, the ability to add /history.json to any thing in Open Library comes from infogami i.e. https://github.com/internetarchive/infogami/blob/a1430609d73316eaebeba0f723e17a82a1a71b8d/infogami/plugins/api/code.py#L212 which already accepts a limit.

I think we'll need to explicitly add/override the history endpoint for subjects, e.g. in https://github.com/internetarchive/openlibrary/blob/master/openlibrary/plugins/worksearch/subjects.py

from infogami.plugins.api.code import history

class subjects_history(delegate.page):
    path = '/subjects/history'
    encoding = 'json'

    @jsonapi
    def GET(self, key):
        i = web.input()
        if i.get('limit') > 100:
            return "can't let you do that, star fox!"
        return history().get()    
MineRobber9000 commented 6 years ago

Alright, I'll get to that. Is 100 a reasonable limit, or should it be higher/lower?

mekarpeles commented 6 years ago

100 seems reasonable to me, code reviewing now -- thank you! :)

jdlrobson commented 5 years ago

What happened here? DoS's shouldn't be open 2 years later! :)

rohitjain00 commented 5 years ago

PR #1238 is closed so I think this should be fixed.

brad2014 commented 5 years ago

PR #1238 is closed so I think this should be fixed.

No, PR #1238 was abandoned, with a note by @hornc or @mekarpeles that this should be resolved within infogami.

I'm removing the "Good First Issue", and adding "Module: Infogami", and assigning to @mekarpeles (for now).

cdrini commented 5 years ago

Dropping to P1 since this doesn't need to be solved today; feel free to re-adjust @hornc