mozilla / http-observatory

Mozilla HTTP Observatory
https://observatory.mozilla.org/
Mozilla Public License 2.0
1.85k stars 168 forks source link

Add final url tested, status code and status reason to scan result dict #190

Open ghost opened 7 years ago

ghost commented 7 years ago

Adding final URL tested, the status code of the response and the reason text returned by the server to the scan results dictionary would help to better understand the reasoning of some results. Also the results should not be without information about where they belong to. Furthermore, it would help in further processing the results.

The web servers do some redirecting, the scanner does some of its own logic, and at the end it's unclear what got scanned.

A change could look like replacing the returned dictionary in httpobs.scanner.local.py:66 with:

    # Return the results
    return({
        'scan': {
            'grade': grades[1],
            'likelihood_indicator': grades[2],
            'response_headers': dict(reqs['responses']['auto'].headers),
            'score': grades[0],
            'tests_failed': NUM_TESTS - tests_passed,
            'tests_passed': tests_passed,
            'tests_quantity': NUM_TESTS,
            'status_code': reqs['responses']['auto'].status_code,
            'status_reason': reqs['responses']['auto'].reason,
            'status_url': reqs['responses']['auto'].url
        },
        'tests': {result.pop('name'): result for result in results}
    })
floatingatoll commented 7 years ago

The detailed scan output used to contain a list of all links followed as part of the redirects and the order they were followed in, including the final destination URL reached as part of the scan. Making that data available in the local scanner's output, if it isn't already present, would match the website API's json output and should meet your needs here.

ghost commented 7 years ago

Yes. Still it would also be good to have the status code available. It would be useful to further process the results. E.g. to show custom warnings if status code was not 200 OK. I'd need this because some tests obviously do not run if status code was not 200 OK

floatingatoll commented 7 years ago

Status code reporting will likely depend on #104 being resolved, since the current code only tests for '200 OK' or 'not 200 OK', and doesn't really dedicate any time or energy to recording whether it was 200 or not.

You may in the meantime find some value in this specific check result, however, which I copied from #191 since it may be of use here as well:

        "subresource-integrity": {
            "data": {},
            "expectation": "sri-implemented-and-external-scripts-loaded-securely",
            "pass": false,
            "result": "request-did-not-return-status-code-200",
            "score_description": "/ did not return a status code of 200",
            "score_modifier": -5
        },

If that specific result occurs, with the description caveat from #193, then that should indicate that at the very least an error occurred somewhere. It's perhaps not a good solution were one to be designed to meet your needs, but it might just be enough to provide what you need (detect when a request failed) until #104 is resolved and a fix is available here.

ghost commented 7 years ago

Ok, but may I ask what's the issue (or technically wrong) with returning this data in the scan section too?:

        'status_code': reqs['responses']['auto'].status_code,
        'status_reason': reqs['responses']['auto'].reason,
        'status_url': reqs['responses']['auto'].url
floatingatoll commented 7 years ago

The status code is currently discarded utterly, without ever being recorded. It's simply not available to return to you in the response at all.

There is not any specific technical obstacle to returning the final URL in that form, but I haven't decided whether I think that's the right solution or not, and I haven't spoken with the owner and authority over this project about my views on the matter, so I'm not prepared to either accept or reject the request as posed.

floatingatoll commented 7 years ago

(Or at least, the underlying status code is often discarded. I tried to extract it for another one of my issues and there are some places where, if you have a bad status code, the entire recording mechanism just doesn't .. record. I think. At all.)