Open jameshadfield opened 4 years ago
Testing frameworks in node.js are fragmented. I think we separate unit tests and integration (feature) tests. We use different frameworks for each test suite.
Currently, I think:
I don't have the expertise here to comment, but am happy to have them separated.
In general, I think feature (e2e?) tests are needed. Over the years we've added amazing capabilities to auspice, and we have hundreds of datasets which make different use of these varied features. As such, manually testing a decent proportion of the datasets at each PR has become untenable and this leads to bugs which often take a while to discover π¦ Creating a dev environment which can give us confidence in this area would be wonderful for nextstrain.
I'm happy to help out here. Can you write down the main test cases that you want to have coverage for? Especially the use cases where you think are most important.
For example:
Rendering
Tree Layout
Map Layout
For integration testing of general frontend logic, I've had good experiences with cypress ( https://www.cypress.io/ ), especially in contrast to previous experiences with setting up Selenium/PhantomJS based testing. It integrates well with CI, so it seems like a logical first step might be to add a Cypress test step to the existing Travis pipeline.
For visual regression testing, I've heard that others have had good experiences with percy.io, but I haven't personally used it. It doesn't sound like this is necessary, but mentioning it just in case it's what you had in mind.
In order to test that the app works with these different datasets - would you be comfortable with having the fixture data files hardcoded into this repository? If so, it would be helpful to know what is a representative set of datasets to store that have a good mixture of properties for a test suite to exercise.
cypress.io looks good. It seems like the framework to get started with now.
Cypress's bottleneck is test recording and parallelization, which are bundled together. So if we run out of test quota, CI/CD tests will stop working.
Our team recently (last 3 months) evaluated between Cypress and Puppeteer, and went with Puppeteer to avoid the hard bottleneck and also paved way for easier upgrade path to Playwright for cross browser testing in all Chrome, Firefox, and Safari down the road!
For unit testing React, Jest + Enzyme will do Integration test can be Jest + Puppeteer
Under the hood, Cypress uses Mocha instead of Jasmine, whereas Jest uses Jasmine already, so that could streamline vendor library dependencies too
Resources:
@tihuan thanks for the info, very useful. I always find node.js ecosystem a mess :). In these situations, I'd delegate the decision to whoever takes the lead. I'm happy to help out with writing some tests.
Thanks for the quick reply, @dnprock π
Yeah, not sure who's gonna lead the testing framework setup, but I can help prototype integration tests with jest + puppeteer if needed!
To clarify, the test recording limit is about sending the test results to the cypress.io dashboard hosted service, which is optional - the test runner itself can be used without the hosted dashboard and is still quite helpful. Additionally they have an unlimited plan for qualifying open source projects ( https://www.cypress.io/oss-plan ). I donβt think this necessarily means it should be picked, but I donβt think the quota concern would apply here.
The point about cross browser testing is a good one to keep in mind, though Iβm not sure whether itβs needed (someone in the org can probably check via google analytics what the userbase looks like). In fairness to cypress, they support Firefox and Edge as of last month, but testing Safari does seem unique to playwright for now.
On Thu, Mar 5, 2020 at 7:01 PM Timmy Huang notifications@github.com wrote:
Cypress's bottleneck is test recording and parallelization, which are bundled together. So if we run out of test quota, CI/CD tests will stop working.
Our team recently (last 3 months) evaluated between Cypress and Puppeteer, and went with Puppeteer to avoid the hard bottleneck and also paved way for easier upgrade path to Playwright for cross browser testing in all Chrome, Firefox, and Safari down the road!
For unit testing React, Jest + Enzyme will do Integration test can be Jest + Puppeteer
Under the hood, Cypress uses Mocha instead of Jasmine, whereas Jest uses Jasmine already, so that could streamline vendor library dependencies too
Resources:
- Puppeteer https://github.com/puppeteer/puppeteer
- Jest-Puppeteeer https://github.com/smooth-code/jest-puppeteer
- Playwright https://github.com/microsoft/playwright - Original Puppeteer team from Google moved on to Microsoft to create this
- Jest https://github.com/facebook/jest
- Enzyme https://github.com/enzymejs/enzyme
β You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nextstrain/auspice/issues/917?email_source=notifications&email_token=ACE2MM32R4OI5QWETZGDYOLRGA4NDA5CNFSM4LBB5CO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN7BGPQ#issuecomment-595465022, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE2MMYJNMPM7EAQDEEUZIDRGA4NDANCNFSM4LBB5COQ .
Hey @hydrosquall thanks for adding the extra info!
Cypress ties dashboard and parallel tests together, since it's pretty much their money maker from what I read here: https://github.com/cypress-io/cypress/issues/2520, so no dashboard or quota will block parallel tests. In our trial, the tests stopped working altogether once the quota ran out
There's a homegrown solution to self host dashboard, but seems like it's still in alpha:
https://github.com/agoldis/sorry-cypress
The free OSS plan is promising! Not sure if its limit of 5 users would be enough for the project though
And thanks for bringing up that Cypress supports Firefox and Edge π Puppeteer also has support for Firefox and Edge, given that now Edge is Chromium based too (https://blogs.windows.com/msedgedev/2019/11/04/edge-chromium-release-candidate-get-ready)
Both Cypress and Puppeteer are easy to setup, so maybe it boils down to which approach is more maintainable in the long run for this project, in terms of contributor interest, experience, free plan user limit, etc.?
Hey, quickly chiming in to mention that Datadog has an E2E testing product, called browser tests and an OSS plan as well If you think it'd be a good fit for your project I'd be happy to work on having that project approved internally :)
Ah yes! Thanks for mentioning Datadog's browser tests and flagging the OSS plan, @MargotLepizzera π₯
My project uses Datadog to periodically (5 mins) ping our app to make sure it's functional, and the integration with Slack is super neat β¨
For writing integration tests, I'm thinking we might need a version control solution, so the test code can be checked into this repo and run locally and on every PR too
A coworker just shared this article: https://medium.com/prodperfect/how-to-build-e2e-test-cases-b0cb3e3c97bf
tl;dr is integration tests should mostly cover critical user paths (a small set of all the tests), and use unit tests and component tests to cover larger area of the code base. This way we don't end up writing all tests in integration, resulting in taking hours to run the test suite!
I just created a puppeteer branch with a POC integration test to click on the Play button with Zika dataset:
https://github.com/nextstrain/auspice/compare/master...tihuan:jest-test-setup?expand=1
To run the integration test:
auspice view --datasetDir data
in one terminal tabnpm install && npm run integration-test
You should get:
HEADLESS=false SLOWMO=10 npm run integration-test
Please let me know if you encounter any issue π Thank you!
Thanks all for your input. I'll defer to @colinmegill to guide which testing framework we go with, as I think this is a very important decision for the project, but I wanted to give a summary of the kind of tests we desperately need. Currently, our testing consists of me loading a number of datasets and trying to run all the functionality I believe the PR may have affected. Things fall through the cracks this way, especially as we have more developers and more features.
We had a PR last week which fixed an open issue where the labels in the legend would overflow & look bad. The PR limited the characters in the legend name, so they didn't overflow. Great! I tested it on a few examples, it worked, I merged it. But this broke legend display for numeric legend traits, which I didn't test on π’ . These are the kind of things I'd love testing to automate out of my life!
For (e.g.) 10 different datasets, with the acknowledgement that some params will be specific to each one). This list isn't exhaustive (but is equally too exhaustive to begin with).
Load dataset & check it renders. Note that there are changes over time -- e.g. as we update the nCoV dataset the tree will gain more tips, the map will gain more locations. We could get around this by having a set of datasets which don't change and which the tests are based on.
Change colorBy to X, then to Y
Change geo Resolution to X
Move between all 4 tree layouts
Toggle between divergence & temporal tree views
Hover over different entries in the legend and observe that the tips change size
Click on different branches & observe that the tree zooms & the map changes.
Use the dataset selector to change to a different dataset
Enable a second tree via the sidebar dropdown (and repeat above tests)
Click on a bar in the entropy panel and see tree coloring update
Load a dataset with queries defined via URL (e.g. ?c=country) and check they modulate state appropriately
Animate dataset & check that it looks as it should (I presume an entry like this must have a number of defined checkboxes for puppeteer?)
I will +1 puppeteer! Looks great @tihuan :)
Sorry for getting back to this late, got a workload burst this week π
Thanks for checking out the PR, @colinmegill π€©π
Love the list you provided too, @jameshadfield ! Sounds like it will involve SVG testing, which I don't have much experience in and will require some research on how to do it right.
@colinmegill how do we do that in cellxgene?
Are other contributors okay with Puppeteer? Does the PR look good to everyone?
Thanks all!
@tihuan I see some changes in formating of .eslintrc file. It may conflict with other devs already on the project.
Otherwise, the test PR looks good. I'm happy to help out.
Hey @dnprock ! Thanks for the quick review! That's a really good point!
I just reverted the linter changes to .eslintrc
file and created a PR
PTAL again π Thanks a lot!
Update: Auspice now has jest / puppeteer support thanks to @tihuan's #943 π
@tihuan & @colinmegill do you recommend closing this issue and creating new issues for each of https://github.com/nextstrain/auspice/issues/917#issuecomment-596348566 or is there more general e2e discussion to happen here first?
Yay thanks everyone for the discussion and reviews! πππ
I think it makes sense to break down https://github.com/nextstrain/auspice/issues/917#issuecomment-596348566 into individual issues, so people can tackle them in parallel
In general, I would recommend people use https://github.com/smooth-code/jest-puppeteer/blob/master/packages/expect-puppeteer/README.md#api where possible, instead of going straight to using Puppeteer's own API in writing tests, unless expect-puppeteer
doesn't have the functionality you need. This is because jest-puppeteer
has a more user friendly API than Puppeteer (src)!
Once we have individual issues for each test case, we could also tag them with priority, so people know which ones to choose first?
Thanks all!
Agree with all the above, thanks so much @tihuan! And thanks for referencing those jest-puppeteer
docs!
π @colinmegill / @tihuan if you have time, could you turn one of those example tests into an issue as you're better placed to know how much detail is required (and please reference those recommendations in the docs so that the test implementations are consistent).
Thanks both! Yeah I'll pick a test you described above and turn it into an issue and see how people like it! βοΈ
Hi all! Quick update on writing an issue for a test π
I'm working on Change colorBy to X, then to Y
now
Given that it's asserting SVG changes, I wanna try using https://github.com/americanexpress/jest-image-snapshot to diff images. Another option is https://percy.io/, but not sure if they have OSS free plan thus testing with jest-image-snapshot
first
The expected outcome of the issue is to provide a list of assertions + image diffs we want the test to do, along with suggested jest-puppeteer
API to use to click around, so whoever picks up the ticket will be able to do so
Ideally, I'd like to do a spike on jest-image-snapshot
first, write a test for it, get it merged, so it can be an example for other tests to follow. What do people think of that approach?
Extra discussion:
In my work project, our TravisCI setup runs a few different checks in parallel:
eslint
setup already, should we add a precommit
hook to run eslint
, so it autofixes/fails a commit when linter is not happy? I see a few commits in the history that run eslint
retroactively, so thought we could automate it too! The tool my work project uses is https://github.com/typicode/huskyLove to hear your thoughts on all three questions!
Thanks all!
Hi @tihuan -- thanks for commenting, it's exciting that this is progressing!
Given that it's asserting SVG changes
This, to me, is the big question. The datasets which auspice sources (via npm run get-data
) change over time, as new genomes become available, so we'd have to provide a set of stable datasets in order to generate the same SVG over time. But this isn't hard. @tihuan I'd recommend using a ncov dataset with the date in the filename for the current test.
What happens if, for instance, we change the font size of the axis text label? Presumably this breaks any test with the tree in view. Is there a way to programmatically regenerate the images needed for all the tests?
Worth me asking at this point if there other approaches than comparing image snapshots which we should consider? E.g. ensuring that a certain SVG element is in the DOM & has the correct colour after a colour-by change (I have no idea what's best practice at the moment).
write a test for it, get it merged, so it can be an example for other tests to follow.
Agreed, with the caveat that we should pick the right approach here as (hopefully) many tests will be forthcoming which use it. Really like your thoughts on detailing a suggested API for others to follow βοΈ
Should we add tests and linter as part of TraisCI PR/branch check?
Seems sensible. Recent work means we're now at 0 errors & warnings (perhaps the first time this has happened!). Enforcing this via husky
seems great -- I haven't used autofixes, so not sure about failing the commit vs autofixing.
Morning @jameshadfield ! Oh thank YOU and your team for running the show and give us such a great tool to use. Especially right now!
I'm just pitching in where I can and trying to be helpful :-)
Hi @tihuan -- thanks for commenting, it's exciting that this is progressing!
Given that it's asserting SVG changes
This, to me, is the big question. The datasets which auspice sources (via
npm run get-data
) change over time, as new genomes become available, so we'd have to provide a set of stable datasets in order to generate the same SVG over time. But this isn't hard. @tihuan I'd recommend using a ncov dataset with the date in the filename for the current test.What happens if, for instance, we change the font size of the axis text label? Presumably this breaks any test with the tree in view. Is there a way to programmatically regenerate the images needed for all the tests?
Yes! The image diffing library uses the same flag as updating regular jest snapshots -u
: doc, so it's easy to update the files when needed
I'm thinking we can create a new command in package.json
to run tests and update snapshots, so people don't need to optionally pass -u
if they expect snapshots and screenshots to change for their commit
Worth me asking at this point if there other approaches than comparing image snapshots which we should consider? E.g. ensuring that a certain SVG element is in the DOM & has the correct colour after a colour-by change (I have no idea what's best practice at the moment).
Great question! We also have unit testing we can use to snapshot DOM structure via jest snapshot testing and enzyme from Airbnb. It will do what you described, which is asserting the SVG elements are rendered with the right attributes, inner text, etc., as expected.
Personally I think we should do both unit testing at the component level by feeding the component different combinations of props, and snapshot expected DOM output. And also UI testing with screenshots to make sure the whole user flow works in the app as expected.
Additionally, since UI/E2E tests are much slower than unit testing, they should only cover critical user flows / happy paths that block prod deployment when any of them breaks. So it's meant to be spot checking things that should never break, while unit testing can cover way more edge cases to ensure any function/component works as intended given any input
Quick definition of unit, integration, and UI tests that I'm referring to, since I feel there are times different people define tests differently π E.g., referring to UI tests
as integration tests
and vice versa. So just wanted to make sure we're aligned on what those terms mean!
write a test for it, get it merged, so it can be an example for other tests to follow.
Agreed, with the caveat that we should pick the right approach here as (hopefully) many tests will be forthcoming which use it. Really like your thoughts on detailing a suggested API for others to follow βοΈ
That sounds great! Yeah, defining patterns that will be used at scale will involve risks, but we can always adjust them as we go when team members discover new patterns that work better.
I think the discussion should continue, so we get more ideas on figuring out best practices for testing data virtualization. Hopefully we can get more people to chime in!
In the meantime I'll work on a prototype and get a PR out for screenshot testing, so we have a concrete example that we can use to discuss pros and cons
Should we add tests and linter as part of TraisCI PR/branch check?
Seems sensible. Recent work means we're now at 0 errors & warnings (perhaps the first time this has happened!). Enforcing this via
husky
seems great -- I haven't used autofixes, so not sure about failing the commit vs autofixing.
HURRAY!! That's a super exciting win ππ€©π!
Yeah, ESLint comes with rules that can be autofixed, so as long as we pass --fix
flag, ESLint will attempt to fix the fixable errors for us and fail the linting only when it encounters errors that it can't autofix
In addition to ESLint, for the code base to have a consistent coding style and format, we can use prettier, which has been wildly popular since its introduction 3 years ago. Definitely highly recommend it!
In the meantime I'll work on a prototype and get a PR out for screenshot testing, so we have a concrete example that we can use to discuss pros and cons
π― -- thanks!
Hi all! Sorry for lack of updates in the past week, been swamped with work ππββοΈ
Currently I got screenshot diffing to work, but a few things I wanna add before submitting a PR:
jest.retryTimes(4)
, so our test suite will automatically retry a failing test up to 4 times, in case of unexplainable flaky tests. Which will require adding jest-circus
as test runner doc. It's super easy to set up, I just have been working nights to get a major project on track right now π
await waitUntilScreenSteady()
to address an issue I discovered while doing screenshot testing, which is that when d3 updates the tree, there's a tiny window where some edges disappear/reappear. So screenshot sometimes fails the diffing because of that. So I was thinking the helper function could take a screenshot periodically, say, every 100ms, and compare the screenshots until they are identical (else retry). So we know the page is steady and ready for the real screenshot diffing test.
I'm hoping to get my work project on track this week, then get back on this testing stuff on Sunday
Hope that's okay! Thank you π π
Just submitted a PR for adding screenshot testing!
PTAL all π π !
https://github.com/nextstrain/auspice/pull/1068
Thank you!
CC: @nellyk @anescobar1991 (two of the maintainers of jest-image-snapshot
). As @tihuan works towards adding screenshot testing in #1068, do you have any advice for initial configuration options based on your experience?
First of all thanks to all of you for working on a project like this one that is helping us in our fight against covid19 (and other malicious pathogens)!
What I'd recommend is executing the tests in a docker container in order to ensure repeatability across operating systems. Otherwise you will likely have issues with snapshots looking one way on a developer's machine (let's say running macOS) but then on CI the snapshots look slightly different causing failures. Many times things like fonts render differently across operating systems causing these issues. This seems heavy handed (and is) but over the years we really have found that it is the only way to have minimally brittle integration tests.
That and make sure that all data is mocked to avoid dependencies on things you don't have control over.
Other than that the things mentioned in https://github.com/nextstrain/auspice/issues/917#issuecomment-611266307 make sense!
Let me know if you face any issues!
Hi all, There is another linter (no, call it a static source analyzer?) called Deepscan with a free open source plan. Currently it shows 24 issues in Auspice. My PR's have been fixes for Deepscan warnings, and Deepscan makes it easy by recommending a fix. Those 24 issues might be harmless but there could be one in 24 which actually breaks something.
Thoughts, implementations and advice welcome!