hedyorg / hedy

Hedy is a gradual programming language to teach children programming. Gradual languages use different language levels, where each level adds new concepts and syntactic complexity. At the end of the Hedy level sequence, kids master a subset of syntactically valid Python.
https://www.hedy.org
European Union Public License 1.2
1.29k stars 284 forks source link

Rewrite e2e tests into 'unittest' format #1058

Closed rix0rrr closed 2 years ago

rix0rrr commented 2 years ago

(This is a refactoring/cleanup task.)


The end to end tests currently send off a set of HTTP requests in a particular order, and each test depends on the previous one succeeding.

This has a couple of issues:

Tests aren't very readable

It's not super obvious what a line like this does:

        ['retrieve programs after deleting saved program', 'get', '/programs_list', {}, {}, 200, retrieveProgramsBefore],

For example, what do the {} in there mean? What does retrieveProgramsBefore do? Why is it there?

Tests aren't self-contained

For example, the following 2 tests:

        ['valid signup', 'post', '/auth/signup', {}, {'username': username, 'password': 'foobar', 'email': username + '@e2e-testing.com'}, 200, successfulSignup],
        # ...
        # 10 unrelated lines follow
        # ...
        ['valid login', 'post', '/auth/login', {}, {'username': username, 'password': 'foobar'}, 200, setSentCookies],

Depend on each other. The first one creates a user, and the second one tests that a login with the user that was created before. The fact that this relationship exists is not clear, however. The following would be a more readable of the same test:

def test_login_works():
  # GIVEN
  user_exists(username, 'foobar')

  # THEN
  login(username, 'foobar').assert_successful()

(Or something similar, just making up an API here)

Because tests aren't self-contained, we also can't choose to run a single individual test--we must always run the entire suite.

Progress and error reporting isn't standardized

If tests were in a standardized format, we could use standardized test runners like py.test to run individual tests, and have nicer error and progress reporting.

The purpose of supporting code will become more clear

The e2e_tests.py file currently has def getProfile1() through def getProfile5(), with what looks like complicated logic inside. It's not clear what their purpose is, because the logic is far away from the actual test case that it supports.

rix0rrr commented 2 years ago

Here's a good thing about the current test setup that we should probably try to maintain (or at least evaluate if it's necessary or not): they don't duplicate work, and will therefore be fast and there are some assumptions that can be made about state.

A user that gets created at the start of the test run and can thereafter be assumed to always exist, doesn't need to be created twice or recreated. Compare:

def test_login_works():
  # GIVEN
  user_exists(username, 'foobar')

  # THEN
  login(username, 'foobar').assert_successful()

def test_saving_programs_works():
  # GIVEN
  user_exists(username, 'foobar')
  login(username, 'foobar').assert_successful()

  # WHEN
  save_program('myprogram', 5, 'print hallo')

  # THEN
  this.expectEquals(list_programs(), ['myprogram')

May have some issues that need to be resolved:

So there are things that the solution to this would need to take into account and have a good answer for.

Unfortunately, I don't know the best solution myself yet, but I'm sure someone can come up with something clever here šŸ˜Š

fpereiro commented 2 years ago

Hi @rix0rrr ! Thank you for bringing up this topic. These are all valid points. Since I'm the one that got those e2e tests written in place, I might have some explaining to do šŸ˜…

The existing test structure is quite unorthodox, but it has a certain structure. The basic outline is that each test maps to a single HTTP request, and is written as a list. The shape is [label, method, path, headers, body, expectedCode, afterFunction], the afterFunction being optional.

Despite this (undocumented) regularity, all your criticisms are valid. I've used opaque names for many of the afterFunctions (such as getProfile5) and that's just plain lazy. Looking back on these particular functions, their operations are probably too long/specific to be put in a camel cased name, but they could be summarized in a comment at their top. For example, getProfile5 checks for the modification of programming experience in the profile after a used has marked that (s)he has no programming experience. At the very least I could put those comments at the top of each afterFunction. In general, almost all of the code in those functions validates the bodies returned by the server. They also sometimes set state to be used by following functions.

Functionally speaking, I also believe it's a must to have a modular test suite, rather than a whole indivisible. I'm actually now in the process of converting a monolithic suite of e2e tests into a modular one, and it's been totally worth it. Some things I learned that we could apply here:

I think that, as you mention, it could vastly improve readability if we could group tests in functions that contain a set of requests. Since Python is synchronous by default, this is a much more palatable proposal than in Javascript. We would still need several functions to test (say) the auth functionality, but they could be grouped logically. Whether to consider a function as a test module, or to consider a bunch of functions a module instead, is a good question.

There's two existing features from the current test I do suggest we keep: 1) the stopping of the test suite at the first error, to ensure we don't ignore warnings or even serious errors; 2) a preservation of the exclusiveness of interaction with the API, without using any backdoors to put things in the database. Sometimes it's really annoying to do everything through the API, but in the end it ends up being worth it.

If you have a particular test runner in mind, I could start by creating an alternative suite of tests for a particular module of the app (perhaps auth would be a good one to start) and we could continue the conversation over some refactored code. Would we still be using the existing code for triggering HTTP requests, or do you envision replacing that as well?

Cheers!

rix0rrr commented 2 years ago

I agree with what you're saying.

1) the stopping of the test suite at the first error,

This is typically a feature of the test runner, and we'll get this for free. For example, if we use pytest as the test runner, we can pytest -x e2e_tests.py and the -x makes it stop at the first error.

2) a preservation of the exclusiveness of interaction with the API, without using any backdoors to put things in the database

Yep, I agree.

If you have a particular test runner in mind

We can separate out "test runner", "test definition framework" and "assertions library". The "test runner" is the least interesting of the bunch, since it is only a "UI" over the rest, and people can pick their own. They can skip the test runner in favor of their IDE or a different runner of their choice. The choice we'll make here will affect what the build server uses (and how it prints output and when it stops on error), but that's about it.

So I would say:

For test definition framework (how do you define a test to run) and assertions library (how do you write assertions): pytest does come with some new capabilities of its own, but I'm thinking that for maximum understandability with other people who might be working on this code base, we should probably just be sticking with the unittest library that comes with Python 3 by default. It's not the most succinct, and it doesn't have the most elaborate set of assertions, but it's definition good enough and it will be familiar to most people. So I would say:

The biggest challenge is going to be coming up with a set of helper functions that make it easy to express what is happening in a test, while hiding away irrelevant details. The goal should be: any function should take the bare minimum of arguments necessary to express an action from the USER's point of view, and implicitly take care of all the rest.

We should pay extra close attention to lifetime of state and variables. For example: a browser session is going to maintain cookies. We should take care to start with a clean cookie jar at the start of each test, so that for every new test it will look like a fresh browser session and they don't (accidentally or otherwise) depend on tests that have run before and left a cookie in there. If we don't pay attention, what's going to happen is that the test is going to succeed when run as part of the suite, and fail when run individually (or vice versa), and that's going to drive us insane.

Would we still be using the existing code for triggering HTTP requests, or do you envision replacing that as well?

The mechanisms in the code are good! It's just that there is now one API which needs to handle all cases, which makes it intimidating to work with. I'm thinking we should probably have different APIs for different cases.

I think for maximum understandability, we should encourage tests to be written in given/when/then style.

For example, a given-when-then test would look like this:

class TestLoggingIn(HedyTestSuite):
  def test_logged_in_user_can_load_my_programs_page(self):
    # GIVEN
    self.assume_logged_in()

    # WHEN
    webpage = self.load_page('/profile/programs')

    # THEN
    webpage.assert_success()

(* In this code, the lines are so trivial that the # GIVEN comment lines maybe are superfluous, and if so we can leave them out).

This is reads very clearly: "assume we are logged in. We can then successfully load the 'programs' page.", but how do we even implement this? Assumptions aren't executable. The simple trick is that even though assume_logged_in is named from the point of view of the test and describes an assumption, it would execute the necessary steps to make the assumption true. So in this case, it would go through a login action (and perhaps even a register_user action if necessary). After the function finishes execution, the system is now in the state that the function name describes, and the test can proceed.

I'm guessing the class that makes this possible looks somewhat like this:

class HedyHttpTestSuite:
  """Base class for API tests."""
  def setUp(self):
    self.username = None
    self.http = requests.Session()

  def assume_logged_in(self):
    self.username = 'testuser-' + str(random.random())
    self.register_user(self.username, 'password').assert_success()
    self.login(self.username, 'password').assert_success()

  def login(self, username, password);
    return self.send_form('/auth/login', {
      "username": username,
      "password": password,
    })

  def load_page(self, url):
    return Webpage(self.http.get(url), self)

  def send_form(self, url, form_data=None):
    return Webpage(self.http.post(url, data=form_encode(form_data))

class Webpage:
  """A wrapper around a requests.Response with convenience methods."""
  def __init__(self, response, test):
    self.response = response

    # self.test only exists so that we can access 'test.assertEquals()'
    self.test = test

  def assert_success(self):
    self.test.assertEquals(200, self.response.http_code)
rix0rrr commented 2 years ago

Obviously, the actual creation of users might be different: we might remember ones that we've already created and not make a new one for each teest.

rix0rrr commented 2 years ago

By the way, just found out that we can also test Flask apps without going over ACTUAL http: https://flask.palletsprojects.com/en/2.0.x/testing/

fpereiro commented 2 years ago

Hi @rix0rrr ! Thank you for the super clear guidelines. I'll be working on it and keep you and @Felienne updated. I'll start with the auth, since we need those functions anyway and try to open a PR as early as possible, to get feedback on the overall structure and approach. Cheers!

fpereiro commented 2 years ago

HI @rix0rrr & @Felienne ! I just committed an initial version of the refactored E2E tests, for early feedback. The code is here: https://github.com/Felienne/hedy/blob/e2e-refactor/tests_e2e.py

Some notes:

All feedback is appreciated!