ryukinix / mal

MAL: A MyAnimeList Command Line Interface [BROKEN: BLAME MyAnimeList]
https://mal.readthedocs.io
Other
109 stars 9 forks source link

Handle errors better if MyAnimeList API is not accessible #74

Closed kutsan closed 5 years ago

kutsan commented 6 years ago

ryukinix commented 6 years ago

Ok. We can create a similar way to treat errors like I did for regex/connection exceptions, using just a decorator to be used in some method or function as defined here: https://github.com/ryukinix/mal/blob/99b72893de02f0510be4436a5ef365bc404989a3/mal/utils.py#L62-L96

And used here: https://github.com/ryukinix/mal/blob/master/mal/api.py

ryukinix commented 6 years ago

@kutsan on that image:

image

I don't see the name of the exception. What is?

kutsan commented 6 years ago

@ryukinix Silly me, for some reason I didn't include the underside of the logs. Don't know what was I thinking. I've already closed that terminal session; so can't provide any useful information on that topic. I'm so sorry.

ryukinix commented 6 years ago

@kutsan hmm, I would try implement this, but would be more difficult without the name of the exception on full log stacktrace. Well... I'll try replicate this later. Would really nice having the full log of this exception.

kutsan commented 6 years ago

I am really so sorry, I feel dumb now. Won't be happen again. :(

ryukinix commented 6 years ago

I was thinking... if I just try change the API url randomly, I can get the same exception? 🤔 because if the API is not working, probably the result is fetch another page...

kutsan commented 6 years ago

I am looking forward to an possible maintenance. :P

ryukinix commented 6 years ago

Thanks! Be watching! xD

ryukinix commented 6 years ago

@kutsan any news on that? No maintenance? I was looking to a way to reproduce this thing... Maybe if I try just change the mal.api.MyAnimeList.base_url I can replicate a similar issue.

kutsan commented 6 years ago

@ryukinix Yeah, haven't encountered any maintenance since then.

I can replicate a similar issue.

Fix that problem and let's close this issue; if happens again in future, I'll let you know.

ryukinix commented 6 years ago

ok, thanks

bradenbest commented 6 years ago

@ryukinix What if you were to create a dummy HTTP server with python's http.server module and re-implement the API but in a way where it gives boilerplate responses? Then you could run a server locally and not have to rely upon MAL (the site) for testing.

$ ./test-mal-server.py
Running API testing server at 127.0.0.1 on port 8080.
--
$ mal list --server http://127.0.0.1:8080
1: Test
    Watching at 0/0 episodes with score -

The dummy API should implement the API for MAL (the website) so that mal (the program) can connect to it during testing and test how it responds to various specific requests.

Using this, you could add switches to test how mal handles different errors. Such as a 500 internal server error, or a 403 forbidden

$ ./test-mal-server.py --500
Running API testing server at 127.0.0.1 on port 8080.
Will respond with an HTTP 500 to all requests
--
$ mal list  --server http://127.0.0.1:8080
<program gets HTTP 500 response from request to localhost:8080 and crashes>

Then you just add this to the unit tests and give mal a switch (--server) that lets you specify a different base domain, like localhost:8080 instead of myanimelist.net, as illustrated above.

AFAIK, http.server is part of python's standard library, so no new dependencies.

https://docs.python.org/3/library/http.server.html

The basic rundown, if you aren't familiar with HTTP, is that your mal client will send a request that looks like this:

...

Well, after probably two hours trying to use my /etc/hosts (127.0.0.1 myanimelist.net) to redirect mal to use my ncat server so I could read the request headers, I couldn't get past the SSL part. Either your security is really good, or it has to do with librequest and the string "https" appearing in the URL base. But I can't get mal to connect to my server, as it fails each time with...

mal:
SSLError: MaxRetryError
          reason: SSLError ¯\_(ツ)_/¯

server (sudo ncat --listen --ssl localhost 443 -v, among many others including using certificates generated by openssl):
...
Ncat: Listening on 127.0.0.1:443
Ncat: Connection from 127.0.0.1.
Ncat: Connection from 127.0.0.1:53202.
Ncat: Failed SSL connection from 127.0.0.1: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca

cURL is able to connect without triggering an error on the ncat server, and this is what it says:

$ curl https://localhost
curl: (60) server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

Using cURL's insecure mode, you can see what an HTTP request looks like:

cURL's end:
$ curl https://localhost --insecure

ncat's end:
$ sudo ncat --listen --ssl localhost 443 -v
...
Ncat: Listening on 127.0.0.1:443
Ncat: Connection from 127.0.0.1.
Ncat: Connection from 127.0.0.1:53224.
[vv this is the request that cURL sends vv]
GET / HTTP/1.1
Host: localhost
User-Agent: curl/7.47.0
Accept: */*

But it's a GET request. And I wanted to show you what request your program is sending so you have a better idea of how to handle incoming requests on the dummy server.

So, one suggestion I have, for the dummy server, is to use raw HTTP. Also be sure to use a port above 1023, because ports 0 - 1023 are owned by root. So it's easier to just use 8000 or 8080.

I suggest isolating in base_url the https://myanimelist.net part and calling that domain or server. Then by passing the aforementioned test flag (--server, e.g. mal list --server http://localhost:8080), have the server variable get switched to that, such that base_url becomes http://localhost:8080/api. That way, when the test server gets the request, it'll look something like:

POST /api HTTP/1.1
data...

But you probably want to know what a POST request actually looks like. Well, I did, too. So I wrote a quick HTML form and served it via http.server (python3 -m http.server), then swapped it out with netcat (nc -l 8000) before submitting.

Here's the form:

<form action="/senpai" method="post">
<input name="username">
<input name="password">
<input type="submit">
</form>

And here's the POST request (+data) it generated with the inputs "sen", "pai":

POST /senpai HTTP/1.1
Host: localhost:8000
Connection: keep-alive
Content-Length: 25
Cache-Control: max-age=0
Origin: http://localhost:8000
Upgrade-Insecure-Requests: 1
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/64.0.3282.167 Chrome/64.0.3282.167 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Referer: http://localhost:8000/form.html
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9

username=sen&password=pai

of course, there is no /senpai page on the server, but netcat doesn't care, because it's not a fileserver.

Anyways, I'm sure python abstracts a lot of that stuff out. You can probably just do something like, I dunno, http.request.url to get /senpai, for example. And then taking that, you can start to implement the behavior for the different "pages" of the dummy API, like /api/anime/search.xml?q=..., for example. In which case, you could treat "Cowboy Bebop" as the only anime in existence, and anything else should 404 so you can test how it reacts to 404's.

This has been a long-winded reply. Sorry about that. But I hope you get the gist. The dummy server/API could be a valuable asset in testing, especially for reproducing errors like this one. Since everything would be under your control. It would also ensure that a run-through of travis-ci doesn't fail if, say, MAL just happens to be down at the time.

Also note that a url like /page?a=somevalue&... is a GET request. Which is what it looks like MAL's api takes in. But I've also seen evidence that it takes POST requests (like in the curl ... -d ... examples given in the API's documentation), so I'm not sure how MAL's API is organized. Seems like a bit of a clusterfuck to me.

Just as a quick reminder, I'm not suggesting to re-implement the entirety of MAL. You are only implementing the interface, while the actual technology that the interface accesses is just a bunch of boilerplate that will always give the same hard-coded response depending on the input.

It's like

int getRandomNumber(){
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

versus

int getRandomNumber(){
    int ret;
    int rand_fd = open("/dev/urandom", O_RDONLY);

    read(rand_fd, &ret, 4); // chosen by reading 4 bytes from /dev/urandom
    close(rand_fd);

    return ret;
}
bradenbest commented 6 years ago

@lubien The point of my reply was to suggest a fix for the non-reproducible roadblock. So if you have something to add, I think we'd all love to hear your input. It'd certainly be more useful to the discussion than a thumbs-down emoji.

By doing that, you've broadcasted for all of us to see that you object to something I said. It's kinda rotten to just keep that to yourself. Get mad! Tell me why I suck.

lubien commented 6 years ago

@bradenbest I don't see the point on boilerplating a server just to mock some HTTP responses as even the exception itself is unknown.

Not only feels like overkill but also seems like missing the original point.

MAL API itself is a mystery. Sometimes it gives "Created" messages in the HTML body, sometimes it don't say anything. It's a "deal with it developers" kinda thing. Trying to mock the unexpected feels just wrong.

bradenbest commented 6 years ago

@lubien Ah. MAL's API is really that bad? Well, thanks for the input. I don't really have anything to add, so have a thumbs up.

lubien commented 6 years ago

@bradenbest Unfortunately. It's too unpredictable and incomplete. Not to talk about undocumented.

ryukinix commented 6 years ago

MAL API is the worst API that I knew in the my entire life.

bradenbest commented 6 years ago

So, as an alternative idea, what if you made a dummy function that returns what you expect from MAL (the site) and returns something that causes the program to crash:

def fetch_site(url):
    # do some stuff
    # return response text from MAL

def fetch_site(url): # version used for testing
    # do some stuff depending on what url says
    # return arbitrary boilerplate

At least then you'd cut out the server and wouldn't have to attempt to recreate the API. Just recreate the responses you expect to get from the server, and add in responses that the program isn't supposed to be receiving (plain text, improperly formatted XML, JSON, HTTP error code, HTML page, RSS feed, all of the above with varying combinations of CR/LF/CRLF line terminators), to test fault tolerance. Surely with enough experimentation, you'll manage to reproduce the bug (and it might even come from something completely unexpected like incompatible line terminators), and can fix it and add the API simulation to the unit tests for the future.

Outside of that, the next time MAL goes down, be it for maintenance, downtime, breakage or domain expiry, document into a file the HTTP header and response you get from MAL with curl -I <url> and curl <url> respectively, along with a description of the circumstances (e.g. "This is from when MAL went down for maintenance on "). Then, you can begin to re-create those responses for testing. I think everybody should try to do this. I'll do it. Next time my copy of mal crashes and myanimelist.net is also having problems, I'll try to remember to make a log and submit it to @ryukinix , with this information, more or less:

copy of python backtrace, including the command that invoked it
screenshot of MAL, if necessary
description of site's state, if necessary
The output of the following commands:
    malusr=<your MAL username>
    malpwd=<your MAL password>
    ^ don't include those in the log, obviously
    curl -I "https://myanimelist.net/malappinfo.php?u=$malusr&status=all&type=anime"
    curl "https://myanimelist.net/malappinfo.php?u=$malusr&status=all&type=anime"
    curl -Iu $malusr:$malpwd "https://myanimelist.net/api/anime/search.xml?q=naruto"
    curl -u $malusr:$malpwd "https://myanimelist.net/api/anime/search.xml?q=naruto"
Which will give the HTTP headers and bodies of both major parts of the API, so it's known EXACTLY what the server responded with to cause the error, and what it might cause for information retrieval
Date in ISO 8601 e.g. 2018-04-02

That's probably the last I'll say on this topic. I wouldn't want to waste anyone's time.

@ryukinix , do you want reports like these, and if so, where should we preferentially post them? On this issue's thread? DM?

ryukinix commented 6 years ago

Due to a recent update of vulnerabilities, MAL API parsing it's broken for some reason.

image

We need debug it.

Sorry @bradenbest for the so late response, I was busy in a recent job after being hired and a lot of infernal stuff at university. Feel free to to fill your ideas where you think it's more appropriate. If you need elaborate your ideas a little longer, I will recommend to you create independent issues instead commenting here.

This thread can freely close just handling better the exceptions.

BTW, would be nice as well fixing this errors.

I'll be still busy in the next days, just reporting these errors here.

ryukinix commented 6 years ago

https://myanimelist.net/api

MAL API seems pretty dead for me. Any news about it that I'm missing?

image

ping @lubien

ryukinix commented 6 years ago

More info about the API dead in #100