nextstrain / auspice

Web app for visualizing pathogen evolution
https://docs.nextstrain.org/projects/auspice/
GNU Affero General Public License v3.0
292 stars 163 forks source link

Add E2E testing #917

Open jameshadfield opened 4 years ago

jameshadfield commented 4 years ago

Thoughts, implementations and advice welcome!

dnprock commented 4 years ago

Testing frameworks in node.js are fragmented. I think we separate unit tests and integration (feature) tests. We use different frameworks for each test suite.

Currently, I think:

jameshadfield commented 4 years ago

I don't have the expertise here to comment, but am happy to have them separated.

In general, I think feature (e2e?) tests are needed. Over the years we've added amazing capabilities to auspice, and we have hundreds of datasets which make different use of these varied features. As such, manually testing a decent proportion of the datasets at each PR has become untenable and this leads to bugs which often take a while to discover 😦 Creating a dev environment which can give us confidence in this area would be wonderful for nextstrain.

dnprock commented 4 years ago

I'm happy to help out here. Can you write down the main test cases that you want to have coverage for? Especially the use cases where you think are most important.

For example:

Rendering

  1. Minimal dataset rendering.
  2. Complex dataset rendering.

Tree Layout

  1. Change Color By option.
  2. Change Tree Layout option.

Map Layout

  1. Change Geographic resolution option.
hydrosquall commented 4 years ago

For integration testing of general frontend logic, I've had good experiences with cypress ( https://www.cypress.io/ ), especially in contrast to previous experiences with setting up Selenium/PhantomJS based testing. It integrates well with CI, so it seems like a logical first step might be to add a Cypress test step to the existing Travis pipeline.

For visual regression testing, I've heard that others have had good experiences with percy.io, but I haven't personally used it. It doesn't sound like this is necessary, but mentioning it just in case it's what you had in mind.

In order to test that the app works with these different datasets - would you be comfortable with having the fixture data files hardcoded into this repository? If so, it would be helpful to know what is a representative set of datasets to store that have a good mixture of properties for a test suite to exercise.

dnprock commented 4 years ago

cypress.io looks good. It seems like the framework to get started with now.

tihuan commented 4 years ago

Cypress's bottleneck is test recording and parallelization, which are bundled together. So if we run out of test quota, CI/CD tests will stop working.

Our team recently (last 3 months) evaluated between Cypress and Puppeteer, and went with Puppeteer to avoid the hard bottleneck and also paved way for easier upgrade path to Playwright for cross browser testing in all Chrome, Firefox, and Safari down the road!

For unit testing React, Jest + Enzyme will do Integration test can be Jest + Puppeteer

Under the hood, Cypress uses Mocha instead of Jasmine, whereas Jest uses Jasmine already, so that could streamline vendor library dependencies too

Resources:

  1. Puppeteer
  2. Jest-Puppeteeer
  3. Playwright - Original Puppeteer team from Google moved on to Microsoft to create this
  4. Jest
  5. Enzyme
dnprock commented 4 years ago

@tihuan thanks for the info, very useful. I always find node.js ecosystem a mess :). In these situations, I'd delegate the decision to whoever takes the lead. I'm happy to help out with writing some tests.

tihuan commented 4 years ago

Thanks for the quick reply, @dnprock πŸ˜„

Yeah, not sure who's gonna lead the testing framework setup, but I can help prototype integration tests with jest + puppeteer if needed!

hydrosquall commented 4 years ago

To clarify, the test recording limit is about sending the test results to the cypress.io dashboard hosted service, which is optional - the test runner itself can be used without the hosted dashboard and is still quite helpful. Additionally they have an unlimited plan for qualifying open source projects ( https://www.cypress.io/oss-plan ). I don’t think this necessarily means it should be picked, but I don’t think the quota concern would apply here.

The point about cross browser testing is a good one to keep in mind, though I’m not sure whether it’s needed (someone in the org can probably check via google analytics what the userbase looks like). In fairness to cypress, they support Firefox and Edge as of last month, but testing Safari does seem unique to playwright for now.

On Thu, Mar 5, 2020 at 7:01 PM Timmy Huang notifications@github.com wrote:

Cypress's bottleneck is test recording and parallelization, which are bundled together. So if we run out of test quota, CI/CD tests will stop working.

Our team recently (last 3 months) evaluated between Cypress and Puppeteer, and went with Puppeteer to avoid the hard bottleneck and also paved way for easier upgrade path to Playwright for cross browser testing in all Chrome, Firefox, and Safari down the road!

For unit testing React, Jest + Enzyme will do Integration test can be Jest + Puppeteer

Under the hood, Cypress uses Mocha instead of Jasmine, whereas Jest uses Jasmine already, so that could streamline vendor library dependencies too

Resources:

  1. Puppeteer https://github.com/puppeteer/puppeteer
  2. Jest-Puppeteeer https://github.com/smooth-code/jest-puppeteer
  3. Playwright https://github.com/microsoft/playwright - Original Puppeteer team from Google moved on to Microsoft to create this
  4. Jest https://github.com/facebook/jest
  5. Enzyme https://github.com/enzymejs/enzyme

β€” You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nextstrain/auspice/issues/917?email_source=notifications&email_token=ACE2MM32R4OI5QWETZGDYOLRGA4NDA5CNFSM4LBB5CO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN7BGPQ#issuecomment-595465022, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE2MMYJNMPM7EAQDEEUZIDRGA4NDANCNFSM4LBB5COQ .

tihuan commented 4 years ago

Hey @hydrosquall thanks for adding the extra info!

Cypress ties dashboard and parallel tests together, since it's pretty much their money maker from what I read here: https://github.com/cypress-io/cypress/issues/2520, so no dashboard or quota will block parallel tests. In our trial, the tests stopped working altogether once the quota ran out

There's a homegrown solution to self host dashboard, but seems like it's still in alpha:

https://github.com/agoldis/sorry-cypress

The free OSS plan is promising! Not sure if its limit of 5 users would be enough for the project though

And thanks for bringing up that Cypress supports Firefox and Edge πŸ‘ Puppeteer also has support for Firefox and Edge, given that now Edge is Chromium based too (https://blogs.windows.com/msedgedev/2019/11/04/edge-chromium-release-candidate-get-ready)

Both Cypress and Puppeteer are easy to setup, so maybe it boils down to which approach is more maintainable in the long run for this project, in terms of contributor interest, experience, free plan user limit, etc.?

MargotLepizzera commented 4 years ago

Hey, quickly chiming in to mention that Datadog has an E2E testing product, called browser tests and an OSS plan as well If you think it'd be a good fit for your project I'd be happy to work on having that project approved internally :)

tihuan commented 4 years ago

Ah yes! Thanks for mentioning Datadog's browser tests and flagging the OSS plan, @MargotLepizzera πŸ₯‡

My project uses Datadog to periodically (5 mins) ping our app to make sure it's functional, and the integration with Slack is super neat ✨

For writing integration tests, I'm thinking we might need a version control solution, so the test code can be checked into this repo and run locally and on every PR too

A coworker just shared this article: https://medium.com/prodperfect/how-to-build-e2e-test-cases-b0cb3e3c97bf

tl;dr is integration tests should mostly cover critical user paths (a small set of all the tests), and use unit tests and component tests to cover larger area of the code base. This way we don't end up writing all tests in integration, resulting in taking hours to run the test suite!

tihuan commented 4 years ago

I just created a puppeteer branch with a POC integration test to click on the Play button with Zika dataset:

https://github.com/nextstrain/auspice/compare/master...tihuan:jest-test-setup?expand=1

To run the integration test:

  1. Please make sure you have the zika dataset
  2. Run auspice view --datasetDir data in one terminal tab
  3. In another terminal tab, run npm install && npm run integration-test

You should get:

Screen Shot 2020-03-07 at 2 37 56 AM
  1. If you wanna see the integration test in browser, you can run: HEADLESS=false SLOWMO=10 npm run integration-test

Please let me know if you encounter any issue πŸ˜„ Thank you!

jameshadfield commented 4 years ago

Thanks all for your input. I'll defer to @colinmegill to guide which testing framework we go with, as I think this is a very important decision for the project, but I wanted to give a summary of the kind of tests we desperately need. Currently, our testing consists of me loading a number of datasets and trying to run all the functionality I believe the PR may have affected. Things fall through the cracks this way, especially as we have more developers and more features.

Example of where testing would have been super helpful this week

We had a PR last week which fixed an open issue where the labels in the legend would overflow & look bad. The PR limited the characters in the legend name, so they didn't overflow. Great! I tested it on a few examples, it worked, I merged it. But this broke legend display for numeric legend traits, which I didn't test on 😒 . These are the kind of things I'd love testing to automate out of my life!

List of test actions it would be wonderful to automate

For (e.g.) 10 different datasets, with the acknowledgement that some params will be specific to each one). This list isn't exhaustive (but is equally too exhaustive to begin with).

colinmegill commented 4 years ago

I will +1 puppeteer! Looks great @tihuan :)

tihuan commented 4 years ago

Sorry for getting back to this late, got a workload burst this week πŸš€

Thanks for checking out the PR, @colinmegill πŸ€©πŸ™

Love the list you provided too, @jameshadfield ! Sounds like it will involve SVG testing, which I don't have much experience in and will require some research on how to do it right.

@colinmegill how do we do that in cellxgene?

Are other contributors okay with Puppeteer? Does the PR look good to everyone?

Thanks all!

dnprock commented 4 years ago

@tihuan I see some changes in formating of .eslintrc file. It may conflict with other devs already on the project.

Otherwise, the test PR looks good. I'm happy to help out.

tihuan commented 4 years ago

Hey @dnprock ! Thanks for the quick review! That's a really good point!

I just reverted the linter changes to .eslintrc file and created a PR

PTAL again πŸ™ Thanks a lot!

jameshadfield commented 4 years ago

Update: Auspice now has jest / puppeteer support thanks to @tihuan's #943 πŸŽ‰

@tihuan & @colinmegill do you recommend closing this issue and creating new issues for each of https://github.com/nextstrain/auspice/issues/917#issuecomment-596348566 or is there more general e2e discussion to happen here first?

tihuan commented 4 years ago

Yay thanks everyone for the discussion and reviews! πŸ˜πŸ™πŸŽ‰

I think it makes sense to break down https://github.com/nextstrain/auspice/issues/917#issuecomment-596348566 into individual issues, so people can tackle them in parallel

In general, I would recommend people use https://github.com/smooth-code/jest-puppeteer/blob/master/packages/expect-puppeteer/README.md#api where possible, instead of going straight to using Puppeteer's own API in writing tests, unless expect-puppeteer doesn't have the functionality you need. This is because jest-puppeteer has a more user friendly API than Puppeteer (src)!

Once we have individual issues for each test case, we could also tag them with priority, so people know which ones to choose first?

Thanks all!

colinmegill commented 4 years ago

Agree with all the above, thanks so much @tihuan! And thanks for referencing those jest-puppeteer docs!

jameshadfield commented 4 years ago

πŸ‘ @colinmegill / @tihuan if you have time, could you turn one of those example tests into an issue as you're better placed to know how much detail is required (and please reference those recommendations in the docs so that the test implementations are consistent).

tihuan commented 4 years ago

Thanks both! Yeah I'll pick a test you described above and turn it into an issue and see how people like it! ✍️

tihuan commented 4 years ago

Hi all! Quick update on writing an issue for a test πŸ˜„

I'm working on Change colorBy to X, then to Y now

Given that it's asserting SVG changes, I wanna try using https://github.com/americanexpress/jest-image-snapshot to diff images. Another option is https://percy.io/, but not sure if they have OSS free plan thus testing with jest-image-snapshot first

The expected outcome of the issue is to provide a list of assertions + image diffs we want the test to do, along with suggested jest-puppeteer API to use to click around, so whoever picks up the ticket will be able to do so

Ideally, I'd like to do a spike on jest-image-snapshot first, write a test for it, get it merged, so it can be an example for other tests to follow. What do people think of that approach?

Extra discussion:

  1. Should we add tests and linter as part of TraisCI PR/branch check?

In my work project, our TravisCI setup runs a few different checks in parallel:

Screen Shot 2020-03-26 at 10 25 26 AM
  1. Since we have eslint setup already, should we add a precommit hook to run eslint, so it autofixes/fails a commit when linter is not happy? I see a few commits in the history that run eslint retroactively, so thought we could automate it too! The tool my work project uses is https://github.com/typicode/husky

Love to hear your thoughts on all three questions!

Thanks all!

jameshadfield commented 4 years ago

Hi @tihuan -- thanks for commenting, it's exciting that this is progressing!

Given that it's asserting SVG changes

This, to me, is the big question. The datasets which auspice sources (via npm run get-data) change over time, as new genomes become available, so we'd have to provide a set of stable datasets in order to generate the same SVG over time. But this isn't hard. @tihuan I'd recommend using a ncov dataset with the date in the filename for the current test.

What happens if, for instance, we change the font size of the axis text label? Presumably this breaks any test with the tree in view. Is there a way to programmatically regenerate the images needed for all the tests?

Worth me asking at this point if there other approaches than comparing image snapshots which we should consider? E.g. ensuring that a certain SVG element is in the DOM & has the correct colour after a colour-by change (I have no idea what's best practice at the moment).

write a test for it, get it merged, so it can be an example for other tests to follow.

Agreed, with the caveat that we should pick the right approach here as (hopefully) many tests will be forthcoming which use it. Really like your thoughts on detailing a suggested API for others to follow ⭐️

Should we add tests and linter as part of TraisCI PR/branch check?

Seems sensible. Recent work means we're now at 0 errors & warnings (perhaps the first time this has happened!). Enforcing this via husky seems great -- I haven't used autofixes, so not sure about failing the commit vs autofixing.

tihuan commented 4 years ago

Morning @jameshadfield ! Oh thank YOU and your team for running the show and give us such a great tool to use. Especially right now!

I'm just pitching in where I can and trying to be helpful :-)

Hi @tihuan -- thanks for commenting, it's exciting that this is progressing!

Given that it's asserting SVG changes

This, to me, is the big question. The datasets which auspice sources (via npm run get-data) change over time, as new genomes become available, so we'd have to provide a set of stable datasets in order to generate the same SVG over time. But this isn't hard. @tihuan I'd recommend using a ncov dataset with the date in the filename for the current test.

What happens if, for instance, we change the font size of the axis text label? Presumably this breaks any test with the tree in view. Is there a way to programmatically regenerate the images needed for all the tests?

Yes! The image diffing library uses the same flag as updating regular jest snapshots -u: doc, so it's easy to update the files when needed

I'm thinking we can create a new command in package.json to run tests and update snapshots, so people don't need to optionally pass -u if they expect snapshots and screenshots to change for their commit

Worth me asking at this point if there other approaches than comparing image snapshots which we should consider? E.g. ensuring that a certain SVG element is in the DOM & has the correct colour after a colour-by change (I have no idea what's best practice at the moment).

Great question! We also have unit testing we can use to snapshot DOM structure via jest snapshot testing and enzyme from Airbnb. It will do what you described, which is asserting the SVG elements are rendered with the right attributes, inner text, etc., as expected.

Personally I think we should do both unit testing at the component level by feeding the component different combinations of props, and snapshot expected DOM output. And also UI testing with screenshots to make sure the whole user flow works in the app as expected.

Additionally, since UI/E2E tests are much slower than unit testing, they should only cover critical user flows / happy paths that block prod deployment when any of them breaks. So it's meant to be spot checking things that should never break, while unit testing can cover way more edge cases to ensure any function/component works as intended given any input

Quick definition of unit, integration, and UI tests that I'm referring to, since I feel there are times different people define tests differently πŸ˜† E.g., referring to UI tests as integration tests and vice versa. So just wanted to make sure we're aligned on what those terms mean!

write a test for it, get it merged, so it can be an example for other tests to follow.

Agreed, with the caveat that we should pick the right approach here as (hopefully) many tests will be forthcoming which use it. Really like your thoughts on detailing a suggested API for others to follow ⭐️

That sounds great! Yeah, defining patterns that will be used at scale will involve risks, but we can always adjust them as we go when team members discover new patterns that work better.

I think the discussion should continue, so we get more ideas on figuring out best practices for testing data virtualization. Hopefully we can get more people to chime in!

In the meantime I'll work on a prototype and get a PR out for screenshot testing, so we have a concrete example that we can use to discuss pros and cons

Should we add tests and linter as part of TraisCI PR/branch check?

Seems sensible. Recent work means we're now at 0 errors & warnings (perhaps the first time this has happened!). Enforcing this via husky seems great -- I haven't used autofixes, so not sure about failing the commit vs autofixing.

HURRAY!! That's a super exciting win πŸŽ‰πŸ€©πŸ‘!

Yeah, ESLint comes with rules that can be autofixed, so as long as we pass --fix flag, ESLint will attempt to fix the fixable errors for us and fail the linting only when it encounters errors that it can't autofix

In addition to ESLint, for the code base to have a consistent coding style and format, we can use prettier, which has been wildly popular since its introduction 3 years ago. Definitely highly recommend it!

jameshadfield commented 4 years ago

In the meantime I'll work on a prototype and get a PR out for screenshot testing, so we have a concrete example that we can use to discuss pros and cons

πŸ’― -- thanks!

tihuan commented 4 years ago

Hi all! Sorry for lack of updates in the past week, been swamped with work πŸŒŠπŸŠβ€β™‚οΈ

Currently I got screenshot diffing to work, but a few things I wanna add before submitting a PR:

  1. Add jest.retryTimes(4), so our test suite will automatically retry a failing test up to 4 times, in case of unexplainable flaky tests. Which will require adding jest-circus as test runner doc.

It's super easy to set up, I just have been working nights to get a major project on track right now πŸ˜†

  1. Add a helper function await waitUntilScreenSteady() to address an issue I discovered while doing screenshot testing, which is that when d3 updates the tree, there's a tiny window where some edges disappear/reappear. So screenshot sometimes fails the diffing because of that.

So I was thinking the helper function could take a screenshot periodically, say, every 100ms, and compare the screenshots until they are identical (else retry). So we know the page is steady and ready for the real screenshot diffing test.

I'm hoping to get my work project on track this week, then get back on this testing stuff on Sunday

Hope that's okay! Thank you 😊 πŸ™

tihuan commented 4 years ago

Just submitted a PR for adding screenshot testing!

PTAL all πŸ˜„ πŸ™ !

https://github.com/nextstrain/auspice/pull/1068

Thank you!

bcoe commented 4 years ago

CC: @nellyk @anescobar1991 (two of the maintainers of jest-image-snapshot). As @tihuan works towards adding screenshot testing in #1068, do you have any advice for initial configuration options based on your experience?

anescobar1991 commented 4 years ago

First of all thanks to all of you for working on a project like this one that is helping us in our fight against covid19 (and other malicious pathogens)!

What I'd recommend is executing the tests in a docker container in order to ensure repeatability across operating systems. Otherwise you will likely have issues with snapshots looking one way on a developer's machine (let's say running macOS) but then on CI the snapshots look slightly different causing failures. Many times things like fonts render differently across operating systems causing these issues. This seems heavy handed (and is) but over the years we really have found that it is the only way to have minimally brittle integration tests.

That and make sure that all data is mocked to avoid dependencies on things you don't have control over.

Other than that the things mentioned in https://github.com/nextstrain/auspice/issues/917#issuecomment-611266307 make sense!

Let me know if you face any issues!

rleir commented 4 years ago

Hi all, There is another linter (no, call it a static source analyzer?) called Deepscan with a free open source plan. Currently it shows 24 issues in Auspice. My PR's have been fixes for Deepscan warnings, and Deepscan makes it easy by recommending a fix. Those 24 issues might be harmless but there could be one in 24 which actually breaks something.