Closed viniciusdc closed 1 year ago
There was some discussion around replacing cypress
with playwright
. I messed around with it for a bit today and was able to work through the first part of authenticating via google (which included working in incognito mode on chromium and it handled the redirects). Here is the code I was using as a POC:
from playwright.sync_api import sync_playwright
import dotenv
import os
dotenv.load_dotenv()
url = 'https://nebari.quansight.dev/user/kcpevey@quansight.com/lab'
google_email = os.environ['GOOGLE_EMAIL']
google_password = os.environ['GOOGLE_PASSWORD']
with sync_playwright() as p:
browser = p.chromium.launch(headless=False, slow_mo=20)
page = browser.new_page()
page.goto(url)
print(page.title())
locator = page.get_by_role("button", name="Sign in with Keycloak")
locator.hover()
locator.click()
page.locator("#social-google").click() # get via element id
# fill in email and click next
page.get_by_label("Email or phone").fill(google_email)
page.get_by_role("button", name="Next").click()
# fill in password and click next
page.get_by_label("Enter your password").fill(google_password)
page.get_by_role("button", name="Next")
browser.close()
I'm happy to investigate this further or start setting something up.
@kcpevey I think one interesting thing we could test out is login in a Nebari deployment and run a jupyter notebook within playwright
I think one interesting thing we could test out is login in a Nebari deployment and run a jupyter notebook within playwright
I could do that, its just a few additional steps.
Thanks a lot @viniciusdc for thorough RFD!! I definitely agree that improving our integration tests (along with CI) will go a long way to improving and speeding up the release process so I'm all for it!
To me, @kcpevey's example of playwright
shows a lot of promise and would make adding and maintaining tests a lot easier. Although Cypress can perform many of these same types of tests, I feel resistance to adding new or improving existing tests mostly because of my lack of JS experience. Just like Cypress, we can run these tests during CI (kubernetes test) and during the integration tests!
I like the idea of making sure that, after each deployed stage, things are working as expected. We currently have a few checks in place that run after each stages, so I wonder if expanding on those would be sufficient. Perhaps instead of --stop-at
we just run the full deployment but we can enable different level of checks.
This might looks something like: nebari deploy -c ... --checks none | basic | full
none
(currently possible with --disable-checks
)basic
(default and what we have now)full
(the additions you are proposing and enabled for CI / IT)I am closing this, as we already have considerations around CI integration enchantments. This RFD does not serve a direct purpose. I will summarize these points in a follow-up to the original tracking issues.
Summary
Currently, our integration tests are responsible for deploying a target version of Nebari (generally based on main/develop) to test stability and confirm that the code is deployable in all cloud providers. These tests can be divided into three categories: "Deploy", "User-Interaction," and "Teardown".
The user interaction is executed by using Cypress to mimic the steps a user would take to use the basic functionalities of Nebari.
The general gist of the workflow can be seen in the diagram above. Some providers like GCP have yet another intermediate job right after the deployment, where a slightly small change is made in the
nebari-config.yaml
to assert that the inner actions (those that come with Nebari) are working as expected.While the above does help when testing and asserting everything "looks" OK, we still need to double-check in every release doing yet another independent deployment to carefully test all features/services and ensure everything is working as expected. This seems like extra work that takes some time to complete (remember that a new deployment on each cloud provider takes around 15~20 min, + any additional checks).
That said, there are still a lot of functionalities that we might need to remember to test that are part of the daily use of Nebari, and making sure all of that works in all providers would become impractical.
Design proposal
what we could do to enhance our current testing suit. These are divided into three major updates:
Stabilizing/backend test
Refactor the "deploy" phase of the workflow so instead of executing the full deployment in serial (aka. just run
nebari deploy
), we could instead deploy eachstage
of nebari in parts, and this would give us the freedom to do more testing around each new artifact/resource added in each stage. This can now be easily done due to the recent additions of a Nebari dev command in the CLI. A way to achieve this would be adding an extradev
flag to theneabari deploy
command to stop at certain checkpoints (which in this case, are the beginning of a new stage)nebari deploy -c .... --stop-at 1
. This would be responsible for deploying nebari until the first stage (generating the corresponding terraform state files for state tracking). The CI would then execute a specialized test suit (could bepytest
,python scripts
...) to assert that:nebari deploy -c .... --stop-at 2
, which would refresh the previous resources and create the new ones. Then stop and run tests accordingly....End-to-End testing (User experience)
Now that the infrastructure exists and is working as planned, we can mimic the user interaction by running a bigger testing suit for cypress (we could also migrate to another tool for easier maintenance). Those tests would then be responsible for checking that Jupyter-related services works, Dask, any extra services like Argo, kbatch, VScode, Dashboards, conda-store...
Teardown
Once all of this completes, we can then move to destroy all the components, right now there is no extra changes to this step, but something we could add it would be beneficial are this:
nebari destroy
User benefit
The user, in this case, would be the maintainers and developers of Nebari who would be able to trust more in the integration tests and retrieve more information on each runs, reducing a lot of the time used by testing all features as well as the confidence that all services and resources were tested and validated before release.
Alternatives or approaches considered (if any)
Best practices
User impact
Unresolved questions