Fixes for import/export and unit tests for local API

douglasbagnall commented 2 years ago

This is to address issue #49.

We change the import/export format to capture all the information needed to successfully import and export models, and to follow industry conventions more closely (the core model is saved in the TensorFlow SavedModel format, and a few bits of moodle-mlbackend metadata are saved in a Python pickle).

The import and export are tested with a round-trip, and an imported model is tested with a prediction run.

These tests do not exactly match the Moodle PHP tests which I have not run. They do more in regards to testing the accuracy of the models, but they might also miss something.

danmarsden commented 2 years ago

cool thanks @douglasbagnall !! @ilyatregubov are you able to do a PR on this one and see if it fixes the Moodle unit tests?

ilyatregubov commented 2 years ago

Yay! I defintely do :) Thanks will run unit tests soon

ilyatregubov commented 2 years ago

Hi @douglasbagnall and @danmarsden ,

I have run unit tests. There are few failures and error.

The failures - I rerun several times. And sometimes I have 3 failing tests, sometimes 2 and sometime 1. Thats .... weird. Thats what I have when there are 3: `There were 3 failures:

1) core_analytics_prediction_testcase::test_ml_training_and_prediction with data set "no_splitting-\mlbackend_python\processor" ('\core\analytics\time_splittin...itting', 0, 1, '\mlbackend_python\processor', null) Failed asserting that '1' matches expected 0. /home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:254 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80

2) core_analytics_prediction_testcase::test_ml_training_and_prediction with data set "quarters-\mlbackend_python\processor" ('\core\analytics\time_splittin...arters', 3, 4, '\mlbackend_python\processor', null) Failed asserting that '1' matches expected 0. /home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:254 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80

3) core_analytics_prediction_testcase::test_ml_multi_classifier with data set "notimesplitting-\mlbackend_python\processor" ('\core\analytics\time_splittin...itting', '\mlbackend_python\processor', null) Failed asserting that '2' matches expected 0. /home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:578 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80 ERRORS! Tests: 26, Assertions: 184, Errors: 2, Failures: 3.`

The errors - I am getting file not exist for import/export unit tests: Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ilyatregubov/.local/lib/python3.7/site-packages/moodlemlbackend/import.py", line 24, in <module> import_classifier() File "/home/ilyatregubov/.local/lib/python3.7/site-packages/moodlemlbackend/import.py", line 17, in import_classifier classifier.import_classifier(sys.argv[3]) File "/home/ilyatregubov/.local/lib/python3.7/site-packages/moodlemlbackend/processor/estimator.py", line 609, in import_classifier classifier = self.load_classifier(importdir) File "/home/ilyatregubov/.local/lib/python3.7/site-packages/moodlemlbackend/processor/estimator.py", line 586, in load_classifier classifier = super(Classifier, self).load_classifier(model_dir) File "/home/ilyatregubov/.local/lib/python3.7/site-packages/moodlemlbackend/processor/estimator.py", line 102, in load_classifier classifier = joblib.load(classifier_filepath) File "/home/ilyatregubov/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 590, in load with open(filename, 'rb') as f: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/requestdir/kAwF/623becdc67326/623bed073df93/mlbackend/classifier.pkl'

ilyatregubov commented 2 years ago

Another thing I notced. When I run python3 -mpytest locally I get few errors

https://pastebin.com/FEJmyyT6

But I see that in here https://github.com/moodlehq/moodle-mlbackend-python/runs/5669722452?check_suite_focus=true everything passes. It seems I have same versions of python

douglasbagnall commented 2 years ago

Well, that's worse than I expected.

For the local failures, maybe you have different versions of dependencies? The CI uses requirements.txt, which is not entirely aligned with setup.py. Paraphrasing https://github.com/moodlehq/moodle-mlbackend-python/blob/master/.github/workflows/tests.yml (with the first two lines added to avoid mucking up your system):

python3 -m venv env
. env/bin/activate
python3 -m pip install --upgrade pip
pip install -r requirements.txt
pip install pytest
PYTHONPATH=. pytest

Next comment will be about the php import failures.

douglasbagnall commented 2 years ago

OK, the import/export test is not working because the import format has changed. I didn't notice that the php tests had a "known good" import to test with.

The old format is no longer usable for two reasons:

All the improvements in performance since 2.3 (2019-08 or thereabouts), due to tweaks in the algorithm, changes to the activation function, filtering out invariant columns, adding more layers, etc, are not captured by the old import format. It implicitly assumes a whole lot of parameters that were in the code.
The methods used are deprecated.

So I switched the format to use Python and Tensorflow's own formats, which are well proven to capture everything, BUT I neglected migration! However, migration is impossible in the 2.3/2.4 -> now case, regardless of the saved format.

I'd suggest have 2.x on the old format, and 3.x on the new, and updating the Moodle tests with the new format.

douglasbagnall commented 2 years ago

As I read it, these tests are asserting that the ML backend gets the results 100% correct every time. That seems brave, even with an artificial dataset.

core_analytics_prediction_testcase::test_ml_training_and_prediction with data set "no_splitting-\mlbackend_python\processor" ('\core\analytics\time_splittin...itting', 0, 1, '\mlbackend_python\processor', null)
Failed asserting that '1' matches expected 0.
/home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:254
/home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80

The one that fails on line 578 is probably the one that fails every time? That involves "multi-classification" meaning 3 classes instead of 2.

ilyatregubov commented 2 years ago

Hi @douglasbagnall ,

So what is a old and what is new format? I can try changing the test, but need more info.

Re the failures. The one that is failing consistantly is this one `1) core_analytics_prediction_testcase::test_ml_multi_classifier with data set "notimesplitting-\mlbackend_python\processor" ('\core\analytics\time_splittin...itting', '\mlbackend_python\processor', null) Failed asserting that '0' matches expected 2.

/home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:578 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80 phpvfscomposer:///home/ilyatregubov/moodles/integration_master/moodle/vendor/phpunit/phpunit/phpunit:97

ERRORS! Tests: 26, Assertions: 244, Errors: 2, Failures: 1. `

And there 2 others that sometimes fail sometimes pass.

What worries me is that it used to pass all the time....

douglasbagnall commented 2 years ago

What worries me is that it used to pass all the time....

Me too!

There are many things I do not understand. The tests get their models from the function add_perfect_model(), and I am finding it hard to work out what training they get after that (i.e. how many rounds, from what data).

I suspect that the tests were too fragile, and the change in the model format means the models are no longer perfect. It is possible that the changed model format means the function no longer adds perfect models.

But, of course, it is also very possible that the tests are revealing a problem with the new code.

I do think that the idea of a perfect model with 100% accuracy is wrong. With the real talks we are dealing with people's lives using constrained data. A model can't be always correct about, say, the likelihood of dropping out.

I am not saying "the test is wrong, my code is right!", I am just saying "the test is [probably] wrong".

So what is a old and what is new format? I can try changing the test, but need more info.

From the point of view of moodle-mlbackend:

the old format was a json file and a Python pickle (an object serialisation format).
the new format is a Tensorflow SavedModel directory and a Python pickle.

For the web interface, these are transmitted as zips.

How they manifest on the Moodle side, I am not clear.

This is an example used in tests: https://github.com/catalyst/moodle-mlbackend-python/tree/catalyst-fix-tests/test/test-models/split-evaluation

ilyatregubov commented 2 years ago

Hi @douglasbagnall ,

Ok lets do 1 step at a time and at least fix export/import. I did a bit of debugging and I see that when model is trained I have this in datafolder:

ls -l /home/ilyatregubov/moodles/integration_master/moodledata_phpu/models/168000/1648199166/execution/classifier/ -rw-rw-rw- 1 ilyatregubov ilyatregubov 436 мар 25 15:06 classifier.pkl

drwxr-xr-x 4 ilyatregubov ilyatregubov 4096 мар 25 15:06 model.ckpt

And inside model.ckpt I have assets, saved_model.pb, variables. So I guess that classifier.pkl is what is missed during impott later.

Then it tries /usr/local/bin/python3.7 -m moodlemlbackend.export 'b42774f497f5a1cc536bf3d40e81cd8c62f601a0' '/home/ilyatregubov/moodles/integration_master/moodledata_phpu/models/168000/1648199166/execution' '/tmp/requestdir/nFBI/623d85e9225ee/623d8638821f9'

And this creates a folder with content of mode.ckpt only. So it is like classifier.pkl is in model but python don't export it?! So by the look of things its on python side

douglasbagnall commented 2 years ago

Thanks @ilyatregubov. That is useful. I'll look into it.

douglasbagnall commented 2 years ago

And this creates a folder with content of mode.ckpt only. So it is like classifier.pkl is in model but python don't export it?! So by the look of things its on python side

Should be better now.

ilyatregubov commented 2 years ago

Checking now

ilyatregubov commented 2 years ago

Thanks @douglasbagnall ,

I confirm export/import tests are passing now. So we need to decide what to do with those remaining failures that are not consistant. May be it is a bad test, may be not. Will try to find someone more knowledgable in HQ, unfortunately its beyond my area atm :(

ilyatregubov commented 2 years ago

Also one thing I noticed. There is another failing test (also sometimes passing) that I haven't seen before:

4) core_analytics_prediction_testcase::test_ml_evaluation_trained_model with data set "case-\mlbackend_python\processor" ('\mlbackend_python\processor', null) Failed asserting that 0.75 matches expected 1.

/home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:689 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80 phpvfscomposer:///home/ilyatregubov/moodles/integration_master/moodle/vendor/phpunit/phpunit/phpunit:97

May be it is due to latest changes. May be I was just 'lucky' it was passing before. Will play around

douglasbagnall commented 2 years ago

My suspicion for this one (as for the others) is that it involves a model trained under the old system that can't be imported in the new one.

I am largely basing my opinion on the comment Tests the evaluation of already trained models.

danmarsden commented 2 years ago

I've been having a look at this with Douglas today and we're suspicous of the export/import process not keeping the directory structure correctly.

the keras_metadata.pb saved_model.pb variables.data-00000-of-00001 variables.index are not being nested properly in the zip within the model.ckpt folder on creation - I'm currently suspicous of this chunk of code: https://github.com/moodle/moodle/blob/master/analytics/classes/model_config.php#L92

I'm finishing up for the day here, but will try to look closer tomorrow.

douglasbagnall commented 2 years ago

I'veI've been having a look at this with Douglas today and we're suspicous of the export/import process not keeping the directory structure correctly.g been having a look at this with Douglas today and we're suspicous of the export/import process not keeping the directory structure correctly.

It was a issue on the Python side, fixed in the latest commit. I think this would have affect the web interface only, not the local API.

ilyatregubov commented 2 years ago

Thanks for update @douglasbagnall and @danmarsden ,

Not sure if export/import is what causing the failures - I mean if process would be broken it would be always broken so we would see same number of tests failing all the time. And yeah - I see latest comment that it was on python side and only affected web interface.

So with latest of version of code still same 4 tests failing (well up to 4). I see 3 of them passing from time to time. And 1 always failing, but on different steps. I.e. sometimes it is this failure (failing asserting status):

3) core_analytics_prediction_testcase::test_ml_evaluation_trained_model with data set "case-\mlbackend_python\processor" ('\mlbackend_python\processor', null) Failed asserting that 4 matches expected 0.

/home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:688 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80 phpvfscomposer:///home/ilyatregubov/moodles/integration_master/moodle/vendor/phpunit/phpunit/phpunit:97

And sometimes it is this failures (asserting a score):

3) core_analytics_prediction_testcase::test_ml_evaluation_trained_model with data set "case-\mlbackend_python\processor" ('\mlbackend_python\processor', null) Failed asserting that 0.73333333333333328 matches expected 1.

/home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:689 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80 phpvfscomposer:///home/ilyatregubov/moodles/integration_master/moodle/vendor/phpunit/phpunit/phpunit:97

danmarsden commented 2 years ago

Thanks Ilya- yeah with Douglas's latest code I'm now seeing the same errors as you. Hoping to look into them further tomorrow.

ilyatregubov commented 2 years ago

Thanks Dan!

douglasbagnall commented 2 years ago

There are 16 new commits, but only 961c73c70108ba8bdc8efceb45196662a81ba5ee really matters.

The others are mostly test utils for debugging and replaying commits, and little test/debug backports from a branch we run in production.

The issue is the mlbackend was not really expecting tiny datasets, and was trying to treat them with care, so as not to overfit the data. The moodle test datasets are really tiny. For example, the multiclass dataset has 3 unique rows. The tests repeat these these rows multiple times (using "number of courses") but don't add any information; it is still 3 data points.

From a data science perspective, it is wise not to overtrain on these data, butfor testing we need to, and probably, in the real world, people want the machine to give them a definite answer even if it is over dubious value. So I increased the training time for small datasets. Most of the time it is enough to get 100% categorical accuracy in the multiclass test (i.e. the highest scoring category is the correct category).

This is probably the same issue with the non-multiclass ones, but I haven't actually tested those. There is a new test for a small binary dataset in the gitlab CI, but it doesn't exactly replicate the Moodle ones (unless by astonishing coincidence).

douglasbagnall commented 2 years ago

The others are mostly test utils for debugging and replaying commits, and little test/debug backports from a branch we run in production.

So, I was meaning to add, I am happy to leave these out, but on the other hand, it is safe to leave them in, and they are useful.

ilyatregubov commented 2 years ago

Thanks @douglasbagnall and @danmarsden,

I am checking latest code and I had several successful runs. Yay! But I am still having one test failing intermittently - test_ml_multi_classifier:

There was 1 failure:

1) core_analytics_prediction_testcase::test_ml_multi_classifier with data set "notimesplitting-\mlbackend_python\processor" ('\core\analytics\time_splittin...itting', '\mlbackend_python\processor', null) Failed asserting that '2' matches expected 0.

/home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:578 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80 phpvfscomposer:///home/ilyatregubov/moodles/integration_master/moodle/vendor/phpunit/phpunit/phpunit:97

FAILURES! Tests: 2, Assertions: 6, Failures: 1.

I am running just test_ml_multi_classifier and I can reproduce the failure

Thanks again for your efforts. Getting really close!

douglasbagnall commented 2 years ago

Latest commits add a test that replays the multiclass test 10 times (as a /evaluation request), and asserts it is 100% right every time. Then there are tweaks that make the network different (simpler) for datasets that have very few unique training examples.

It passed locally with the number set to 100, and seems to have passed again in the github CI run.

the lingering doubts are: is it really the same test? and do the changes break something else?

douglasbagnall commented 2 years ago

do the changes break something else?

OK, it seems this might have happened. I'll look further.

danmarsden commented 2 years ago

thansk @douglasbagnall I've run this several times here and no failures yet - @ilyatregubov would be good if you could run it too and let us know if you're getting any intermittent failures.

ilyatregubov commented 2 years ago

Thanks! Will run tests bunch of times soon!

ilyatregubov commented 2 years ago

Hi all,

All unit tests are passing \o/ Now I am gonna run prediction models on big database to check accuracy just in case. Things are looking good! Thanks for your efforts

ilyatregubov commented 2 years ago

Accuracy seems fine too!

ilyatregubov commented 2 years ago

Will start on packaging a new version tomorrow morning, then create new docker images and test those

ilyatregubov commented 2 years ago

Hi @douglasbagnall and @danmarsden ,

Merged. New docker images are available at https://hub.docker.com/r/moodlehq/moodle-mlbackend-python/tags. Thanks again for fixing this!!!

stronk7 commented 2 years ago

Awesome work, @douglasbagnall , @ilyatregubov and @danmarsden ! Thanks!

moodlehq / moodle-mlbackend-python

Fixes for import/export and unit tests for local API #51