Open marco-c opened 6 years ago
To do this, the first step would be to create a script to list all the inconsistencies.
I am working on this. Just wanted to let people know so that we don't end up doing duplicate work.
Which format would be best for exporting the inconsistencies?
CSV
with every row being an inconsistencyJSON
as a list of inconsistencies, the list consisting of dicts detailing it.Either CSV or a line-limited JSON (https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON, that is a JSON object per line).
A normal JSON is a bit problematic because you can't easily see diffs between two versions of the file (e.g. if you just add one entry, the diff for a normal JSON file will show you the entire file).
@marco-c what do you think we could do next regarding this ?
Manually look at the inconsistencies and see what prevented us from taking a screenshot. E.g. force the crawler to only load the website with an inconsistency and see if the crawler throws an exception in one of the browsers.
can you point me in a direction as to how i could work with the crawler in this case and force it to load a site?
The crawler is in collect.py
, you need to change it to load a URL you want instead of loading an URL from one of the webcompat bugs.
@marco-c where do I get the URL's of the websites for which we have inconsistent screenshots, we haven't stored these website URL's anywhere
We have stored the webcompat ID, so you can retrieve the URLs either with Python by using utils.get_bugs()
and finding the bug you want, or by loading the bug on the webcompat.com website (e.g. https://webcompat.com/issues/1491).
There are a few cases where the crawler was not able to take screenshots. We should figure out why and try to fix any issue that we notice.
The files under
data/
are in the formatWEBCOMPAT-ID_ELEMENT-ID_BROWSER.png
. WEBCOMPAT-ID are the IDs from webcompat.com. ELEMENT-ID are the element IDs where the crawler clicked before taking the screenshot. BROWSER is the name of the browser.We should investigate these cases: