Create automated test of mwoffliner

kelson42 commented 7 years ago

We need to figure out an approach to do that. But these automated tests should run for each PR before merging and assure a certain level of quality.

ISNIT0 commented 6 years ago

We should probably record various metrics in these tests:

Speed of downloading (Is it faster/slower than before this change)
Size of file (Is compression better or worse than before this change)
Accuracy of file (Does it look okay?) - we could use automated screenshots

Anything else we should add?

ISNIT0 commented 6 years ago

@bradyhunsaker Could you put some of your ideas here please?

I'm thinking a basic diff tool for eyeballing could be made in bash:

git checkout master
mwoffliner --mwUrl=testWiki ....
mv tmp/testWiki tmp/test-run-master
git checkout branch
mwoffliner --mwUrl=testWiki ....
mv tmp/testWiki tmp/test-run-branch

diff tmp/test-run-master tmp/test-run-branch

bradyhunsaker commented 6 years ago

Yes, that's basically what I was imagining. I'm not sure how well diff handles directories, so that's one detail to check on.

The other aspect I would like is to have a random selection of articles to diff (maybe with weights). That's how we get some assurance that if a particular issue affects X% of articles, then it will show up in the diff with probability Y, depending on the size N of the sample. Of course, starting with a fixed set of articles wouldn't be awful. I'm also not sure how well diff handles directories, but we can experiment with that.

bradyhunsaker commented 6 years ago

It turns out the MediaWiki API supports getting a random page. I created a small script that gets a random sample of N pages. And diff has a --recursive option for comparing directories.

I'll put it together with a script along the lines that @ISNIT0 described above. Hopefully this weekend.

ISNIT0 commented 6 years ago

@bradyhunsaker Would be nice if it could run on Travis and maybe comment back on the PR? I was thinking we could have screenshots/diffs commented on all PRs to review that we haven't broken anything?

ISNIT0 commented 5 years ago

@bradyhunsaker How far did you get with this so far?

@subbuss It was mentioned to me by @kelson42 that the Parsoid team may have some verification tools that were used to compare Parsoid output to PHP/HTML output, do you know anything about this?

subbuss commented 5 years ago

Yes, we use a set of tools for this. There is (a) https://github.com/wikimedia/mediawiki-services-parsoid-testreduce (b) https://github.com/wikimedia/integration-visualdiff (c) https://github.com/wikimedia/uprightdiff.

Testreduce is a generic server/client tool for running mass tests (we've test runs that run against 150K pages in the largest usecase and 25K in the smallest usecase). For this purpose, we use visualdiffing for testing that screenshots two HTML output sources (could be individual files or urls that generate HTML). visualdiffing internally uses PhantomJS (which unfortunately has been EOLed at this time) for screenshotting and doing other HTML manipulation before screenshotting. Finally, we use uprightdiff to compare the screenshots and generate a numerical score that can be used for sorting test results (used by testreduce to generate a listing of failures by worst to best)

See https://www.mediawiki.org/wiki/Parsing/Visual_Diff_Testing

The docs are not very extensive at this time.

bradyhunsaker commented 5 years ago

I have not made more progress.

I have a shell script that uses the REST API to get a set of random pages. The approach @subbuss mentions of using a large set of top articles is also reasonable when the time is available. A small random set is good for catching issues that occur in a certain percentage of the set of articles. For example, a set of 20 random pages has a 95% chance of testing an issue that appears in at least 14% of all pages. I'm happy to share or submit that at any point. I was waiting until I had it connected to a test that actually uses it.

I plan to take a look at tools @subbuss mentions. But with the holiday season here, I'm not sure how much time I will make in the next few weeks.

On Wed, Nov 28, 2018 at 10:16 AM Joe Reeve notifications@github.com wrote:

@bradyhunsaker https://github.com/bradyhunsaker How far did you get with this so far?

@subbuss https://github.com/subbuss It was mentioned to me by @kelson42 https://github.com/kelson42 that the Parsoid team may have some verification tools that were used to compare Parsoid output to PHP/HTML output, do you know anything about this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzim/mwoffliner/issues/146#issuecomment-442375204, or mute the thread https://github.com/notifications/unsubscribe-auth/AMObJOJegXnbUsY48OLl96FCjoAGZMAGks5uzlRQgaJpZM4P1_Ce .

ISNIT0 commented 5 years ago

I have created and merged a very simple test runner (compares a running branch to master with a recursive diff).

The results are never exactly the same, I think because there's a race to get similar files and the first (or last) downloaded wins.

I'll leave this ticket open because we still need to implement:

Screenshot comparisons
Unit tests

ISNIT0 commented 5 years ago

Would be nice to use this: https://blog.percy.io/visual-testing-in-nightwatch-js-with-percy-1b68c122cf94

openzim / mwoffliner

Create automated test of mwoffliner #146