[SQLite Runner] Refactoring YAML Code for Tests Input

laendoor commented 7 years ago

For now, YAML Tests should be like this:

solution_type: datasets
examples:
- data: "INSERT INTO ... " # or empty string -> ""
  solution_dataset: |
    id|name
    1|Mumuki

# or (but not both)

solution_type: query
examples:
- data: "INSERT INTO ... " # or empty string -> ""
  solution_query: select * from table;

I think it would be better to write YAML Tests like this:

- type: query
  seed: "INSERT INTO ... " # optional
  expected: select * from table;
- type: dataset
  seed: "INSERT INTO ... " # optional
  expected: |
    id|name
    1|Mumuki

I think this proposal is clearer and it could mix query and dataset types in the same test code.

Now, once you choose solution_type, you have to make all examples in the same type.

In the other hand, I think I can add the "final query" feature in order to resolve non-select exercise. For example, if you have an ALTER TABLE exercise, you need to require that User write a SELECT sentence in the last line. But that work could be done by the teacher with a YAML like this

# Exercise type: ALTER & UPDATE
# Add coumn 'descrip'
# Expected solution: ALTER TABLE names ADD COLUMN description VARCHAR(255);
# UPDATE names SET description = 'Aprendé a Programar' WHERE name = 'Mumuki';;
- type: dataset
  expected: |
    id|name|description
    1|Mumuki|Aprendé a Programar
  final_query: SELECT * FROM names;

@flbulgarelli tell me what do you think

If you like you can assign me this issue

flbulgarelli commented 7 years ago

Looks good to me!

Just there is something I am not getting: which is the purpose of the final query? Is it going to be the output displayed to the user, the dataset that is going to be compared or both?

laendoor commented 7 years ago

The thing is this:

In a non-select exercise, user should write sql code like this

ALTER TABLE names ADD COLUMN description VARCHAR(255);

SELECT * FROM names;

That last SELECT is needed for result comparison. One option is put that code in the Extra Code field. But user could accidentally delete the code. And I think is nicer for the user to make "hidden" that extra code.

With this proposal way, when teachers make a non-select exercise, they can put expected dataset and the final query that should retrieve that dataset. And the user just focuses on resolve excersice.

In the runner behavior, the final query sentences is appended to the student code. So, in fact, final user code will be the same as the example, but is provided by the teacher as "hidden extra code".

Maybe final query is ambiguous. It could be verification query (too long, right?) or something else (if short & descriptive, far better)

flbulgarelli commented 7 years ago

Ok @leandrojdl, sounds good.

We have recently implemented something very similar to your final_query in the new try_hook:

We have called it just query - https://github.com/mumuki/mumukit/issues/58#issuecomment-335513894 - but I think final_query is just fine.

flbulgarelli commented 7 years ago

Hi @leandrojdl! Could you have a look at this issue?

In the other hand, I think I can add the "final query" feature in order to resolve non-select exercise. For example, if you have an ALTER TABLE exercise, you need to require that User write a SELECT sentence in the last line. But that work could be done by the teacher with a YAML like this
- type: dataset
 expected: ALTER TABLE names ADD COLUMN description VARCHAR(255);
 final_query: SELECT * FROM names;

I was just reading the description again and I think there is an odd line: expected: ALTER TABLE names ADD COLUMN description VARCHAR(255);. It should be like this, isn't?

- type: dataset
  final_query: SELECT * FROM names;
  expected: |
    id|name
    1|Mumuki

flbulgarelli commented 7 years ago

Also, I have an extra idea: perhaps we could add more structures and have four different kind of validations - instead of datasets and query:

exact_query: same as query, just I think that perhaps that name is more intention revealing, right? :thinking:
query_results: like current datasets, but only covers the no "final query" scenario.
table_contents: similar to the "final query" scenario, but you specify a table name only, instad of a select
- custom_select_results: like current datasets, but only covers the "final query" scenario

Examples:

# checks that the user enters 'select * from FOO'
# 
# Example passing submission:
#
# {content: 'select * from FOO'}
- type: exact_query
  expected: 'select * from FOO'

# check that the user enters a query that outputs the expected dataset  
#
# Example passing submission:
#
# {content: 'select * from FOO'}
- type: query_results
  expected: |
    id|name
    1|Mumuki

# check that the user enters a query that outputs the expected dataset  
#
# Example passing submission:
#
# {content: "insert into (id, name) values (1, 'mumuki');" }
# check that the given table contains the expected data
- type: table_contents
  table: FOO
  expected: |
    id|name
    1|Mumuki

# check that the custom given query outputs the expected dataset
#
# Example passing submission:
#
# {content: "insert into (id, name) values (1, 'mumuki');" }
- type: custom_select_results
  select: select * from FOO
  expected: |
    id|name
    1|Mumuki

What do you think @leandrojdl?

laendoor commented 6 years ago

I like some validations. I'll try to pass in clear both ideas. As "precondition" I'll say that the way to know if an excercise is correct is when returned data is the same as expected data.

First scenario: Select exercise

That is the "simplest". We only need to compare student-results with teacher-results by "executing" both codes. But teacher could have different types to build his results:

SQL explicit code (type: query aka exact_query)
RAW DATA TABLE result as raw (type: dataset aka query_results)
TABLE NAME from where obtain results (type: table aka table_contents)

extra: |
    CREATE TABLE colors (
      id integer primary key,
      name varchar(200) NOT NULL
    );
    INSERT INTO table (name) values ('Celeste');
    INSERT INTO table (name) values ('Blanco');
content: SELECT * FROM colors; # student's code
test:
    # compare both execution results directly
    - type: query
      expected: SELECT id, name FROM colors;

    # create 'SELECT * FROM colors;' and then compare as type:query
    - type: table
      expected: colors

    # compare content execution results with this raw dataset
    - type: dataset
      expected: |
        id|name
        1|Celeste
        2|Blanco

Maybe we can include type: exact_query as if teacher expected SELECT id, name ..., it's not valid to write SELECT * .... But I think that is more an extra-after validation.

Maybe we could add some more flexibility in the future, but I think these examples are the basics. What do you think?

Second scenario: Non-Select exercise (CREATE, ALTER, INSERT, DELETE, UPDATE, ...)

Here is where we need to add some extras ideas to obtain data to be compared, because as precondition we need to check exercise comparing data rows.

So we need to add a final_query, final, select or another-good key to "append" SELECT sentence to both codes: student's and teacher's:

## Add 'hex' column to table 'colors' and update 'Blanco' with '#FFFFFF'
extra: |
    CREATE TABLE colors (
      id integer primary key,
      name varchar(200) NOT NULL
    );
    INSERT INTO table (name) values ('Celeste');
    INSERT INTO table (name) values ('Blanco');
content: | # student's code
  ALTER TABLE colors ADD COLUMN hex VARCHAR(10);
  UPDATE color SET hex = '#FFFFFF' WHERE name = 'Blanco';
test:
    # query
    # append `final_query` to both codes then compare both execution results directly
    - type: query
      expected: |
        ALTER TABLE colors ADD COLUMN hex VARCHAR(10);
        UPDATE color SET hex = '#FFFFFF' WHERE name = 'Blanco';
      final_query: SELECT id, name FROM colors;

    # table
    # Here maybe has no sense to use 'final_query' unless teacher write 'expected' code
    # as in 'query' and pass 'colors' table to 'final_query'
    - type: table
      expected: colors

    # table*
    # this is how it would be 'type:table', but it could be confusing
    - type: table
      expected: |
        ALTER TABLE colors ADD COLUMN hex VARCHAR(10);
        UPDATE color SET hex = '#FFFFFF' WHERE name = 'Blanco';
      final_query: colors # this is converted into 'SELECT * FROM colors;'

    # dataset
    # this is the best way to use 'final_query' because is no needed to write
    # SQL code in 'expected'
    - type: dataset
      expected: |
        id|name|hex
        1|Celeste|#FFFFFF
        2|Blanco|
      final_query: SELECT id, name, hex FROM colors;

I'm trying to keep the same keys for all types because I think is easier to remember.

So, what do you think?

flbulgarelli commented 6 years ago

I like some validations. I'll try to pass in clear both ideas. As "precondition" I'll say that the way to know if an excercise is correct is when returned data is the same as expected data.

OK.

That is the "simplest". We only need to compare student-results with teacher-results by "executing" both codes. But teacher could have different types to build his results:

SQL explicit code (type: query aka exact_query) ... compare both execution results directly

Now I notice that my exact_query is not the same as your query.

your query compares the results of both querys
my exact_query checks just that the student entered some specific query, regardless of its results.

So, for example, if the user enters select count(*) from colors...

if you use exact_query with expected value select count(1) from colors, it will fail - although both queries are equivalent, they are not the same exact query.
but if you use query with expected value select count(1) from colors it will pass - both queries return the same.

I think both are useful and that we could support both. But I like your query more than my exact_query, so now I think that your proposal should had higher priority.

Maybe we can include type: exact_query as if teacher expected SELECT id, name ..., it's not valid to write SELECT * .... But I think that is more an extra-after validation.

OK.

flbulgarelli commented 6 years ago

Regarding the other scenarios, I see that you are proposing 3 + 4 use cases, and I am proposing 4, some of them are the same with different names and some others aren't.

Please give me some minutes to better understand them.

flbulgarelli commented 6 years ago

1. Plain `query`

- type: query
  expected: SELECT id, name FROM colors;

-> OK. It was not in my proposal, but as explained in comment, I prefer your's.

2. Plain `dataset`

- type: dataset
  expected: |
    id|name
    1|Celeste
    2|Blanco

-> OK. It is what I called query_results

3. `dataset` with `final_query`

# dataset
# this is the best way to use 'final_query' because is no needed to write
# SQL code in 'expected'
- type: dataset
  expected: |
    id|name|hex
    1|Celeste|#FFFFFF
    2|Blanco|
  final_query: SELECT id, name, hex FROM colors;

-> OK. This is what I called custom_select_results

As a side note, I'd prefer to have different types for this two scenarios, instead of haven a single type with an optional final_query field. I think it allows both teachers and developers to keep things simple, structured and easy to distinguish.

I'm trying to keep the same keys for all types because I think is easier to remember.

Another reason is that I'd like to add a graphical interface in the future for entering those specs, instead of writing yaml on hand. And a type combo + only mandatory fields is easier to implement and easier to understand visually, since it cuts alternatives.

4. `query` + `expected` + `final_query` and 5. `table` + `expected` + `final_query`

# query
# append `final_query` to both codes then compare both execution results directly
- type: query
  expected: |
    ALTER TABLE colors ADD COLUMN hex VARCHAR(10);
    UPDATE color SET hex = '#FFFFFF' WHERE name = 'Blanco';
  final_query: SELECT id, name FROM colors;

# table*
# this is how it would be 'type:table', but it could be confusing
- type: table
  expected: |
    ALTER TABLE colors ADD COLUMN hex VARCHAR(10);
    UPDATE color SET hex = '#FFFFFF' WHERE name = 'Blanco';
  final_query: colors # this is converted into 'SELECT * FROM colors;'

-> The problem with this way of checking is that it requires running the code twice, and I'd prefer to minimize the amount of code executed in the server. I think it's implementation is very different to the others - run twice from start then check results vs run from start and check in the same step - and I am not sure it delivers a great value, since as you already said dataset + final_query is probably the best way of checking.

6. Plain `table`

# create 'SELECT * FROM colors;' and then compare as type:query
- type: table
  expected: colors

-> I think this assertion is OK, but perhaps is too specific. I'd not go on with it. It is not exactly the same as table_contents, since it checks a query result agains the contents of a table, instead of a table agains a teacher's provided query result. Put in other words, one checks that the student has queried a table, but the other checks that the student has put some data into a table.

However, now I think we neither should implement table_contents right now, since it doesn't add to much value compared to dataset+final_query aka custom_query_results

7. Plain `table` (again)

# table
# Here maybe has no sense to use 'final_query' unless teacher write 'expected' code
# as in 'query' and pass 'colors' table to 'final_query'
- type: table
  expected: colors

-> I think I not understing this scenario :confused:. Could you provide some other example, please?

Conclusion

To sum up, I'd go with only three assertions by now that is somewhat a marge of all those proposals:

query with no final_query, in order to check selects using a query
dataset with no final_query, in order to check selects using a dataset
dataset with final_query, aka custom_select_results, in order to check inserts, updates, etc, using a dataset. We should pick a better name :smile:, too. Perhaps final_dataset or resulting_dataset?

@leandrojdl please provide your thoughts.

laendoor commented 6 years ago

Plain table (again)

I was trying to said that maybe that scenario has no sense with final_query, except if was wrote as # table*; but is confused.

laendoor commented 6 years ago

Conclusion

To sum up, I'd go with only three assertions by now that is somewhat a marge of all those proposals:

query with no final_query, in order to check selects using a query

Agree. Then:

type: query
expected: SELECT ...

It's ok?

dataset with no final_query, in order to check selects using a dataset

Agree. Then:

type: dataset      # or another better name, I don't care, but I prefer a single word
expected: |
  id|name
  1|Foo

It's ok?

dataset with final_query, aka custom_select_results, in order to check inserts, updates, etc, using a dataset. We should pick a better name :smile:, too. Perhaps final_dataset or resulting_dataset?

Yes, it's difficult to get a good name for this :smile:

Ok, I have no problem to make this only one enabled to "append" query to not select scenarios. So, for example, it could be:

type: final_dataset       # or `resulting_dataset`, `dataset_with_query`, `dataset_with_final_query`, `statement`, just `final`... i don't know... for know let's use `final_dataset`
query: SELECT ...
expected: |
  id|name
  1|Foo

# or...
type: final_dataset
final: SELECT ...
dataset: |
  id|name
  1|Foo
# to make it as 'mnemotechnic' =P

This unique type could have an extra/final query to append to student code.

--

It's all a bit confusing, yes :smile:

flbulgarelli commented 6 years ago

Agree. Then:
type: query
expected: SELECT ...
It's ok?

:+1:

type: dataset      # or another better name, I don't care, but I prefer a single word
expected: |
  id|name
  1|Foo

It's ok?

:+1:

Ok, I have no problem to make this only one enabled to "append" query to not select scenarios. So, for example, it could be:
type: final_dataset       # or `resulting_dataset`, `dataset_with_query`, `dataset_with_final_query`, >`statement`, just `final`... i don't know... for know let's use `final_dataset`
query: SELECT ...
expected: |
  id|name
  1|Foo

# or...
type: final_dataset
final: SELECT ...
dataset: |
  id|name
  1|Foo
# to make it as 'mnemotechnic' =P
This unique type could have an extra/final query to append to student code.

:+1: too. And yes, let's use final_dataset and both final and query:wink:

So let's go with this spec by now.

mumuki / mumuki-platform