RFD - Upgrade Integration tests - WIP

viniciusdc commented 1 year ago

Status	Draft 🚧 / Open for comments 💬/ Accepted ✅ /Implemented 🚀/ Obsolete 🗃 / Rejected ⛔️
Author(s)	@viniciusdc
Date Created	02-02-2023
Date Last updated	--
Decision deadline	--

Summary

Currently, our integration tests are responsible for deploying a target version of Nebari (generally based on main/develop) to test stability and confirm that the code is deployable in all cloud providers. These tests can be divided into three categories: "Deploy", "User-Interaction," and "Teardown".

The user interaction is executed by using Cypress to mimic the steps a user would take to use the basic functionalities of Nebari.

Blank diagram (7)

The general gist of the workflow can be seen in the diagram above. Some providers like GCP have yet another intermediate job right after the deployment, where a slightly small change is made in the nebari-config.yaml to assert that the inner actions (those that come with Nebari) are working as expected.

While the above does help when testing and asserting everything "looks" OK, we still need to double-check in every release doing yet another independent deployment to carefully test all features/services and ensure everything is working as expected. This seems like extra work that takes some time to complete (remember that a new deployment on each cloud provider takes around 15~20 min, + any additional checks).

That said, there are still a lot of functionalities that we might need to remember to test that are part of the daily use of Nebari, and making sure all of that works in all providers would become impractical.

Design proposal

what we could do to enhance our current testing suit. These are divided into three major updates:

Stabilizing/backend test

Refactor the "deploy" phase of the workflow so instead of executing the full deployment in serial (aka. just run nebari deploy), we could instead deploy each stage of nebari in parts, and this would give us the freedom to do more testing around each new artifact/resource added in each stage. This can now be easily done due to the recent additions of a Nebari dev command in the CLI. A way to achieve this would be adding an extra dev flag to the neabari deploy command to stop at certain checkpoints (which in this case, are the beginning of a new stage)

CI runs nebari deploy -c .... --stop-at 1. This would be responsible for deploying nebari until the first stage (generating the corresponding terraform state files for state tracking). The CI would then execute a specialized test suit (could be pytest, python scripts...) to assert that:
- The cloud resources created are indeed present in the cloud infrastructure (can be done using the cloud provider CLI tools)
- Check that kubernets-related resources exist as expected (kubectl extra commands checks)
- Atest that all available endpoints exist, and run appropriate functions to each API (in case of extensions/services like Argo etc..)
After the above tests are complete, execute nebari deploy -c .... --stop-at 2, which would refresh the previous resources and create the new ones. Then stop and run tests accordingly....
- ...

End-to-End testing (User experience)

Now that the infrastructure exists and is working as planned, we can mimic the user interaction by running a bigger testing suit for cypress (we could also migrate to another tool for easier maintenance). Those tests would then be responsible for checking that Jupyter-related services works, Dask, any extra services like Argo, kbatch, VScode, Dashboards, conda-store...

Teardown

Once all of this completes, we can then move to destroy all the components, right now there is no extra changes to this step, but something we could add it would be beneficial are this:

Develop cloud specific scripts for removing lingering resources in case of failing nebari destroy
Save information around the error (why it failed) as artifacts like status about the cluster, roles, etc. that could help us identify why some resources keep staying after destruction and how we could try to reduce it (or at least catalog those in the docs)

User benefit

The user, in this case, would be the maintainers and developers of Nebari who would be able to trust more in the integration tests and retrieve more information on each runs, reducing a lot of the time used by testing all features as well as the confidence that all services and resources were tested and validated before release.

Alternatives or approaches considered (if any)

Best practices

User impact

Unresolved questions

kcpevey commented 1 year ago

There was some discussion around replacing cypress with playwright. I messed around with it for a bit today and was able to work through the first part of authenticating via google (which included working in incognito mode on chromium and it handled the redirects). Here is the code I was using as a POC:

from playwright.sync_api import sync_playwright
import dotenv
import os

dotenv.load_dotenv()

url = 'https://nebari.quansight.dev/user/kcpevey@quansight.com/lab'
google_email = os.environ['GOOGLE_EMAIL']
google_password = os.environ['GOOGLE_PASSWORD']

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False, slow_mo=20)
    page = browser.new_page()
    page.goto(url)
    print(page.title())
    locator = page.get_by_role("button", name="Sign in with Keycloak")
    locator.hover()
    locator.click()

    page.locator("#social-google").click()  # get via element id

    # fill in email and click next
    page.get_by_label("Email or phone").fill(google_email)
    page.get_by_role("button", name="Next").click() 

    # fill in password and click next
    page.get_by_label("Enter your password").fill(google_password)
    page.get_by_role("button", name="Next")

    browser.close()

I'm happy to investigate this further or start setting something up.

viniciusdc commented 1 year ago

@kcpevey I think one interesting thing we could test out is login in a Nebari deployment and run a jupyter notebook within playwright

kcpevey commented 1 year ago

I think one interesting thing we could test out is login in a Nebari deployment and run a jupyter notebook within playwright

I could do that, its just a few additional steps.

iameskild commented 1 year ago

Thanks a lot @viniciusdc for thorough RFD!! I definitely agree that improving our integration tests (along with CI) will go a long way to improving and speeding up the release process so I'm all for it!

To me, @kcpevey's example of playwright shows a lot of promise and would make adding and maintaining tests a lot easier. Although Cypress can perform many of these same types of tests, I feel resistance to adding new or improving existing tests mostly because of my lack of JS experience. Just like Cypress, we can run these tests during CI (kubernetes test) and during the integration tests!

I like the idea of making sure that, after each deployed stage, things are working as expected. We currently have a few checks in place that run after each stages, so I wonder if expanding on those would be sufficient. Perhaps instead of --stop-at we just run the full deployment but we can enable different level of checks.

This might looks something like: nebari deploy -c ... --checks none | basic | full

none (currently possible with --disable-checks)
basic (default and what we have now)
full (the additions you are proposing and enabled for CI / IT)

viniciusdc commented 1 year ago

I am closing this, as we already have considerations around CI integration enchantments. This RFD does not serve a direct purpose. I will summarize these points in a follow-up to the original tracking issues.

nebari-dev / governance