63 test results - Githubissues

spokenlanguage / platalea

Library for training visually-grounded models of spoken language understanding.

Apache License 2.0

3 stars 1 forks source link

63 test results #78

Closed cwmeijer closed 3 years ago

cwmeijer commented 3 years ago

This PR adds an assert statement to every existing test aimed at experiments. The assert tests the result, for instance, the loss, or the rank. I didn't discuss this kind of assertion with anyone, so it's definitely worth looking at these specifics assertions and see if you guys agree with what I did. I also had to change the experiments and scripts in order to be able to read the results from the tests.

cwmeijer commented 3 years ago

I'll have a look at the conflicts now, so I'll turn this into a draft again.

cwmeijer commented 3 years ago

All experiment functions now return sensible results that can be useful to the user. These consist of performance metrics of every step. Because the tests are usually performing only a single update step, with minimal input data, the performance metrics are often trivial. For instance, rank.10 is always 1 because, in the tests, there are no more instances than 10. I, therefore, added a training loss to the results to have the results include a measure very sensitive to logic/code changes. Because different machines resulted in slightly different roundings, I had to use an approximate checker instead of an exact one. Therefore, I chose to include pandas in the test environment.

cwmeijer commented 3 years ago

The step_loss is the training loss at that time step. So, just another performance metric about the current time step. Of course, it is not the most useful performance metric ever, but it is somewhat informative for a user, and of course useful for the test. If you don't agree, we can look for other solutions.

bhigy commented 3 years ago

No, I agree. I didn't really think about the degenerate scores we get when testing with a just one batch. With this, it makes sense to add the last training loss.

Is there a specific reason why you don't add it in asr.py though?

bhigy commented 3 years ago

Checking the code again, I think step_loss is actually a bit misleading. This is actually the mean training loss over the whole epoch right?