Generating test files automatically

sahinakkaya commented 5 years ago

I know it's not the main concern of calico to fulfill such a task but most of the test files requires the tester to do similar things:

Tester generally writes his/her program and then runs it, taking notes of inputs and outputs of the program on one hand.
S/he then writes "expect", "send" actions for a test case by taking care of metacharacters that appears in the output of the program and escapes them properly.
S/he runs the program again with different input, if the previous test case's scheme is similar to new one, copy pastes it and modifies it for the new values otherwise jumps to 2.
Repeats 2. and 3. continuously until s/he thinks the coverage of test cases are enough.

So it's repetitive, time consuming and attention-requiring process. But most of it can be automated. The only part that "a human is really required" is when deciding the inputs of the program. Writing the actions expect, send, setting return values and escaping metacharacters, all can be automated.

So I think it would be nice to have something like this:

$ ./eval_exp Give me a simple expression to evaluate: 2 3 2 3 is 6.

$ calico --generate-test-file test.t "./eval_exp" Enter the number of test cases do you want: 1

Running for case_1... Give me a simple expression to evaluate: 3 + 5 3 + 5 is 8. Assign points for this run: 5

All test cases have been generated. Exiting.

$ cat test.t ## Placeholder for building the executable if required. case_1: run: ./evalexp script: - expect: "Give me a simple expression to evaluate: " - send: "3 + 5" - expect: "3 [+] 5 is 8[.][\r\n]" - expect: _EOF\ return: 0 points: 5

The prompts in the interactive part may be overwhelming but I included them to elucidate my idea.

And that's all, thanks for taking time to read.

sahinakkaya commented 5 years ago

This idea came to my mind last week. I worked a bit on it, tried subprocess library to implement it but it didn't work. Then I found this article on the internet. It was exactly what I need. Implemented the same code in python with some tweaks and currently I'm able to produce this: $ python3 capture_io.py "./circle" \~ Enter radius of circle: 3 Area: 28.274309 \~ ./circle program ended. ('output', 'Enter radius of circle: ') ('input', '3\n') ('output', 'Area: 28.274309\n') ('return code', 0)

The tilde (~) signs are not part of "./circle" program. They have meanings in my implementation and unfortunately I couldn't find a way to get rid of them. I'll finish and send a PR when I have free time but I believe there are more elegant ways to do this. (asyncio, multiprocessing, threading are the libraries that I think might help but my knowledge on these are very limited. )

talha252 commented 5 years ago

I am not sure why you need a multitasking library to solve this?

The first observation of yours is good. However, rather than generating verbose test file, I would prefer to make it concise. I think the problem you mention can be solved with a new property which I am going to call it algorithm block. Here is a probable algorithm block:

_define:
    algorithms:
       simple_alg: 
            - expect: "Give me a simple expression to evaluate: "
            - send: <<param1>>
            - expect: <<param2>>
            - expect: _EOF_
case_1:
    run: ./eval_exp
        script:
            - algorithm: simple_alg
                - param1: "3 + 5"
                - param2: "3 [+] 5 is 8[.][\r\n]"
        return: 0
        points: 5

case_2:
    run: ./eval_exp
        script:
            - algorithm: simple_alg
                - param1: "8 + 9"
                - param2: "8 [+] 9 is 17[.][\r\n]"
        return: 0
        points: 5

I haven't thought on the exact syntax of this attribute but I think that this would allow for the tester to write much more concise test files. Also, it is possible to chain multiple algorithm block (or with send and expect) which would have made the test file much smaller. The only drawback is that this is much harder to parse, it requires a lot of error checking and if it is not done correctly it's prone to unexpected errors.

sahinakkaya commented 5 years ago

I am not sure why you need a multitasking library to solve this?

I'm not either. I don't know much about these libraries as I mentioned. So just by looking at their names or thinking the keywords about the problem, I thought one of them may be used to handle this.

I'll give an example why I thought asyncio might be used, you can skip this part if you like.

The biggest handicap in my current implementation is it's based on "write/read" loop, get input from user print output, and continue like this. I can make it "read/write" loop by placing a "\~". It's like escape symbol for my program. But not all programs fits with read/write or write/read loop. One of them may need write, write, write, read. So I need to escape from reading 2 times by placing "\~" after first 2 write operation. This is ugly. It would be better to read and write asynchronously so that neither reading nor writing is a blocking operation (I think). Read from stdout of the program as long as there are bytes to read, write to stdin of program as long as user supplies bytes to write. So this looks like a job to handle with... asyncio of course (I think, again). And I mentioned the other libraries for similar reasons. They sounds like good to go with this problem for me.

... rather than generating verbose test file, I would prefer to make it concise.

The motivation behind my idea is, if tester runs his program to be sure his outputs are consistent anyway, why not create the test files from those runs? So it assumes there is a program that the tester uses to produce correct outputs, and I hope there is :) If this is the case, I wouldn't concern the length of the test file because it will be auto-generated. If you're still concerned about it, I wonder why?

... I think the problem you mention can be solved with a new property which I am going to call it algorithm block. ... Also, it is possible to chain multiple algorithm block

Surely, this would help a lot to tester especially with the ability to chain them. But you still have to escape from metacharacters manually, which I think is pain. The whole story is started after I saw something like this. It's a test file that is used to evaluate one of our assignments and I would definitely refuse to write something like that.

ghost commented 5 years ago

Interesting idea, needs some exploring. I have my reservations though. We use a regex to compensate for submissions that follow the specs but not accurately. I don't see how the matching rules can be relaxed without human intervention. Also, you need a reference implementation of the solution -which is OK most of the time but probably shouldn't be a hard requirement.

ghost commented 5 years ago

I'm not sure whether you need the subprocess module or similar modules. Calico already uses pexpect which is a higher level module for the same job. You should only look at alternatives if you can't do it with pexpect because that would create alternative ways of achieving the same goal in the same codebase.

ghost commented 5 years ago

And, I might be overlooking something here but I don't see how asyncio is relevant for this project. What calico does is a very synchronous operation. The program has nothing to do if the checked program isn't ready yet. multiprocessing might make sense if you want to parallelise the tests but that's not a priority. Using a task queue might be worthwhile to deal with long running submissions.

ghost commented 5 years ago

The sample test file you've pointed might be simplified by predefined variables. Maybe integrating a templating system like jinja or mako might help improve the situation.

sahinakkaya commented 5 years ago

... I don't see how the matching rules can be relaxed without human intervention.

Yes, of course but my aim is to help to the tester as much as possible. If the tester wants to accept any number of newlines for example, s/he can just put a "*" where it is necessary. It's easier to modify than to create the whole file.

... Also, you need a reference implementation of the solution -which is OK most of the time but probably shouldn't be a hard requirement.

If you are talking about my implementation, I'll share it when I've some free time. I'm basically running the tested program as a child process and interact with its stdin and stdout. It looks like this , in case of you missed it. The key thing is capturing the *stdin* of child. What you've sent to the child, in which order? What were the outputs of the child, in which order?

... You should only look at alternatives if you can't do it with pexpect

I've already tried it with pexpect

>>> import pexpect
>>> p = pexpect.spawn("./circle")
>>> p.readline() # hangs
>>> p.read() # hangs

But I should be able to read something because when I run "./circle" normally I would see "Enter radius of circle: "

>>> s = input() # tester *knows* his program is waiting for input
3 
>>> p.sendline(s)
>>> p.readline()
b'Enter radius of circle: 3\r\n'

This is not I want. I want this I/O part to be as same as tester runs his program from command line. There shouldn't be any difference. I may be missing something but this is why I don't think pexpect is not useful here.

It's not easy to explain the other things you mentioned without you seeing my implementation. I'll try to share it as soon as I can but it's not possible before this weekend.

ghost commented 5 years ago

I have developed code that captures stdin-stdout of a child process, I know how it works. Any particular reason you are taking C code as your basis instead of python subprocess? You may be right that pexpect won't be suitable if your program doesn't know what the scenario is in advance. In that case "non-blocking" might be a better starting point concept than "async" for this problem. But I'm not really sure. I don't have much time to spend on calico at the moment (PRs are always welcome, though).

ghost commented 5 years ago

By "reference implementation of the solution" I mean that the instructor has to implement the assignment solution beforehand, if I understand correctly. For small problems this is OK but it will hurt usability for larger assignments where the instructor might not have the means to implement everything. Of course, in that case manual test case writing is still available as a fallback, so nothing lost.

sahinakkaya commented 5 years ago

Any particular reason you are taking C code as your basis instead of python subprocess?

I tried to do it with subprocess, but couldn't manage to make it work as I want. This is the only reason that I'm trying different things.

sahinakkaya commented 4 years ago

It has been a year since I opened this. When I read the discussion again, one line written by me caught my attention:

I'll try to share it as soon as I can but it's not possible before this weekend.

Well... I kept my promise technically, I didn't post before that weekend and didn't mention when exactly I'll share so no problem :) Jokes aside, I came back to here 4-5 times in the past year but I couldn't find a good solution that's worth sharing. However, I kept improving my existing code, searched for more elegant ways to achieve the same thing, and finally I think it became as simple as it can be. Here is my existing implementation, and an example code for you to try it. An example run looks like this:

$ python3 generate_test_spec.py ./circle
Enter the number of test cases you want: 2
-------------------------
Running for case_1...
Enter radius of circle: 5
Area: 78.539749
case_1 ended
Assign points for this run: 75
-------------------------
Running for case_2...
Enter radius of circle: -1
Negative radius values are not allowed.
case_2 ended
Assign points for this run: 25
-------------------------
- case_1:
    run: ./circle
    script: 
        - expect: "Enter radius of circle: "
        - send: "5"
        - expect: "Area: 78[.]539749[\r\n]"
        - expect: _EOF_
    return: 0
    points: 75
- case_2:
    run: ./circle
    script:
        - expect: "Enter radius of circle: "
        - send: "-1"
        - expect: "Negative radius values are not allowed[.][\r\n]"
        - expect: _EOF_
    return: 1
    points: 25

I have used builtin pty module to spawn and control child processes. subprocess doesn't work because creating a pipe automatically buffers the stdout. You can find more detailed explanation in pexpect's FAQ. And here is a question that I asked on StackOverflow. You can read it if you are curious about why I didn't use pexpect.

Finally, when I thought that my current implementation is OK, I decided to share it via pull request. I tried to add this feature to calico but couldn't find a proper way to handle command line arguments. I think the correct way to use this feature would be something like this:

$ calico --generate-test-spec ./circle

So it will not require an input file, namely spec, to run. And when input file is present that means user is testing so --generate-test-spec option shouldn't be allowed in that case. This can be probably solved by a simple if statement but in that case the "usage" message that is automatically generated by argparse also needs to be written manually. I think this is a very strong sign that the thing I'm trying to achieve is not a good idea.

The very first sentence in this discussion:

I know it's not the main concern of calico to fulfill such a task

You were right, 1 year ago me. You have proved it in the hard way and I like it :)

And that was all from me. I'll keep this open if you have something to say but I'm convinced that it's not a good idea to add this feature to calico so I think this can be closed. Thank you for taking the time to read.

ghost commented 4 years ago

Very nice work. I didn't know about pty, thanks. A few notes:

I think this would be better as a standalone script rather than being integrated into the calico command-line. I would gladly merge it as a second script into the repository.
Using argparse for the cli arguments would be nice, but not a requirement.
I would recommend using a yaml package (calico uses ruamel.yaml already) to generate the yaml file from a data structure instead of relying on indenting properly.

As a side note, it would be very interesting to integrate all this with hypothesis. Somebody did something like that already

sahinakkaya commented 4 years ago

OK, I just realized your comment. I was already done with my implementation so it doesn't have the things you've mentioned. I don't know about ruamel.yaml too much but it would definitely be better with it for the indentations.

I think this would be better as a standalone script rather than being integrated into the calico command-line.

I can also do this. Let me know what you think about the current PR :)

uyar / calico

Generating test files automatically #9