Investigate cases where the crawler wasn't able to take screenshots - Githubissues

marco-c / autowebcompat

Automatically detect web compatibility issues

Mozilla Public License 2.0

34 stars 41 forks source link

Investigate cases where the crawler wasn't able to take screenshots #10

Open marco-c opened 6 years ago

marco-c commented 6 years ago

There are a few cases where the crawler was not able to take screenshots. We should figure out why and try to fix any issue that we notice.

The files under data/ are in the format WEBCOMPAT-ID_ELEMENT-ID_BROWSER.png. WEBCOMPAT-ID are the IDs from webcompat.com. ELEMENT-ID are the element IDs where the crawler clicked before taking the screenshot. BROWSER is the name of the browser.

We should investigate these cases:

XXXX_firefox.png is present but XXXX_chrome.png is not present.
XXXX_ELEMENT_firefox.png is present but XXXX_ELEMENT_chrome.png is present.

marco-c commented 6 years ago

To do this, the first step would be to create a script to list all the inconsistencies.

skulltech commented 6 years ago

I am working on this. Just wanted to let people know so that we don't end up doing duplicate work.

skulltech commented 6 years ago

Which format would be best for exporting the inconsistencies?

A CSV with every row being an inconsistency
A JSON as a list of inconsistencies, the list consisting of dicts detailing it.

marco-c commented 6 years ago

Either CSV or a line-limited JSON (https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON, that is a JSON object per line).

A normal JSON is a bit problematic because you can't easily see diffs between two versions of the file (e.g. if you just add one entry, the diff for a normal JSON file will show you the entire file).

Shashi456 commented 6 years ago

@marco-c what do you think we could do next regarding this ?

marco-c commented 6 years ago

Manually look at the inconsistencies and see what prevented us from taking a screenshot. E.g. force the crawler to only load the website with an inconsistency and see if the crawler throws an exception in one of the browsers.

Shashi456 commented 6 years ago

can you point me in a direction as to how i could work with the crawler in this case and force it to load a site?

marco-c commented 6 years ago

The crawler is in collect.py, you need to change it to load a URL you want instead of loading an URL from one of the webcompat bugs.

Shashi456 commented 6 years ago

@marco-c where do I get the URL's of the websites for which we have inconsistent screenshots, we haven't stored these website URL's anywhere

marco-c commented 6 years ago

We have stored the webcompat ID, so you can retrieve the URLs either with Python by using utils.get_bugs() and finding the bug you want, or by loading the bug on the webcompat.com website (e.g. https://webcompat.com/issues/1491).