Closed douglasbagnall closed 2 years ago
cool thanks @douglasbagnall !! @ilyatregubov are you able to do a PR on this one and see if it fixes the Moodle unit tests?
Yay! I defintely do :) Thanks will run unit tests soon
Hi @douglasbagnall and @danmarsden ,
I have run unit tests. There are few failures and error.
1) core_analytics_prediction_testcase::test_ml_training_and_prediction with data set "no_splitting-\mlbackend_python\processor" ('\core\analytics\time_splittin...itting', 0, 1, '\mlbackend_python\processor', null) Failed asserting that '1' matches expected 0. /home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:254 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80
2) core_analytics_prediction_testcase::test_ml_training_and_prediction with data set "quarters-\mlbackend_python\processor" ('\core\analytics\time_splittin...arters', 3, 4, '\mlbackend_python\processor', null) Failed asserting that '1' matches expected 0. /home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:254 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80
3) core_analytics_prediction_testcase::test_ml_multi_classifier with data set "notimesplitting-\mlbackend_python\processor" ('\core\analytics\time_splittin...itting', '\mlbackend_python\processor', null) Failed asserting that '2' matches expected 0. /home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:578 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80 ERRORS! Tests: 26, Assertions: 184, Errors: 2, Failures: 3.`
Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ilyatregubov/.local/lib/python3.7/site-packages/moodlemlbackend/import.py", line 24, in <module> import_classifier() File "/home/ilyatregubov/.local/lib/python3.7/site-packages/moodlemlbackend/import.py", line 17, in import_classifier classifier.import_classifier(sys.argv[3]) File "/home/ilyatregubov/.local/lib/python3.7/site-packages/moodlemlbackend/processor/estimator.py", line 609, in import_classifier classifier = self.load_classifier(importdir) File "/home/ilyatregubov/.local/lib/python3.7/site-packages/moodlemlbackend/processor/estimator.py", line 586, in load_classifier classifier = super(Classifier, self).load_classifier(model_dir) File "/home/ilyatregubov/.local/lib/python3.7/site-packages/moodlemlbackend/processor/estimator.py", line 102, in load_classifier classifier = joblib.load(classifier_filepath) File "/home/ilyatregubov/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 590, in load with open(filename, 'rb') as f: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/requestdir/kAwF/623becdc67326/623bed073df93/mlbackend/classifier.pkl'
Another thing I notced. When I run python3 -mpytest
locally I get few errors
But I see that in here https://github.com/moodlehq/moodle-mlbackend-python/runs/5669722452?check_suite_focus=true everything passes. It seems I have same versions of python
Well, that's worse than I expected.
For the local failures, maybe you have different versions of dependencies? The CI uses requirements.txt
, which is not entirely aligned with setup.py
. Paraphrasing https://github.com/moodlehq/moodle-mlbackend-python/blob/master/.github/workflows/tests.yml (with the first two lines added to avoid mucking up your system):
python3 -m venv env
. env/bin/activate
python3 -m pip install --upgrade pip
pip install -r requirements.txt
pip install pytest
PYTHONPATH=. pytest
Next comment will be about the php import failures.
OK, the import/export test is not working because the import format has changed. I didn't notice that the php tests had a "known good" import to test with.
The old format is no longer usable for two reasons:
So I switched the format to use Python and Tensorflow's own formats, which are well proven to capture everything, BUT I neglected migration! However, migration is impossible in the 2.3/2.4 -> now case, regardless of the saved format.
I'd suggest have 2.x on the old format, and 3.x on the new, and updating the Moodle tests with the new format.
As I read it, these tests are asserting that the ML backend gets the results 100% correct every time. That seems brave, even with an artificial dataset.
core_analytics_prediction_testcase::test_ml_training_and_prediction with data set "no_splitting-\mlbackend_python\processor" ('\core\analytics\time_splittin...itting', 0, 1, '\mlbackend_python\processor', null)
Failed asserting that '1' matches expected 0.
/home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:254
/home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80
The one that fails on line 578 is probably the one that fails every time? That involves "multi-classification" meaning 3 classes instead of 2.
Hi @douglasbagnall ,
So what is a old and what is new format? I can try changing the test, but need more info.
Re the failures. The one that is failing consistantly is this one `1) core_analytics_prediction_testcase::test_ml_multi_classifier with data set "notimesplitting-\mlbackend_python\processor" ('\core\analytics\time_splittin...itting', '\mlbackend_python\processor', null) Failed asserting that '0' matches expected 2.
/home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:578 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80 phpvfscomposer:///home/ilyatregubov/moodles/integration_master/moodle/vendor/phpunit/phpunit/phpunit:97
ERRORS! Tests: 26, Assertions: 244, Errors: 2, Failures: 1. `
And there 2 others that sometimes fail sometimes pass.
What worries me is that it used to pass all the time....
What worries me is that it used to pass all the time....
Me too!
There are many things I do not understand. The tests get their models from the function add_perfect_model()
, and I am finding it hard to work out what training they get after that (i.e. how many rounds, from what data).
I suspect that the tests were too fragile, and the change in the model format means the models are no longer perfect. It is possible that the changed model format means the function no longer adds perfect models.
But, of course, it is also very possible that the tests are revealing a problem with the new code.
I do think that the idea of a perfect model with 100% accuracy is wrong. With the real talks we are dealing with people's lives using constrained data. A model can't be always correct about, say, the likelihood of dropping out.
I am not saying "the test is wrong, my code is right!", I am just saying "the test is [probably] wrong".
So what is a old and what is new format? I can try changing the test, but need more info.
From the point of view of moodle-mlbackend:
For the web interface, these are transmitted as zips.
How they manifest on the Moodle side, I am not clear.
This is an example used in tests: https://github.com/catalyst/moodle-mlbackend-python/tree/catalyst-fix-tests/test/test-models/split-evaluation
Hi @douglasbagnall ,
Ok lets do 1 step at a time and at least fix export/import. I did a bit of debugging and I see that when model is trained I have this in datafolder:
ls -l /home/ilyatregubov/moodles/integration_master/moodledata_phpu/models/168000/1648199166/execution/classifier/
-rw-rw-rw- 1 ilyatregubov ilyatregubov 436 мар 25 15:06 classifier.pkl
drwxr-xr-x 4 ilyatregubov ilyatregubov 4096 мар 25 15:06 model.ckpt
And inside model.ckpt I have assets, saved_model.pb, variables
. So I guess that classifier.pkl is what is missed during impott later.
Then it tries /usr/local/bin/python3.7 -m moodlemlbackend.export 'b42774f497f5a1cc536bf3d40e81cd8c62f601a0' '/home/ilyatregubov/moodles/integration_master/moodledata_phpu/models/168000/1648199166/execution' '/tmp/requestdir/nFBI/623d85e9225ee/623d8638821f9'
And this creates a folder with content of mode.ckpt only. So it is like classifier.pkl is in model but python don't export it?! So by the look of things its on python side
Thanks @ilyatregubov. That is useful. I'll look into it.
And this creates a folder with content of mode.ckpt only. So it is like classifier.pkl is in model but python don't export it?! So by the look of things its on python side
Should be better now.
Checking now
Thanks @douglasbagnall ,
I confirm export/import tests are passing now. So we need to decide what to do with those remaining failures that are not consistant. May be it is a bad test, may be not. Will try to find someone more knowledgable in HQ, unfortunately its beyond my area atm :(
Also one thing I noticed. There is another failing test (also sometimes passing) that I haven't seen before:
4) core_analytics_prediction_testcase::test_ml_evaluation_trained_model with data set "case-\mlbackend_python\processor" ('\mlbackend_python\processor', null) Failed asserting that 0.75 matches expected 1.
/home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:689 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80 phpvfscomposer:///home/ilyatregubov/moodles/integration_master/moodle/vendor/phpunit/phpunit/phpunit:97
May be it is due to latest changes. May be I was just 'lucky' it was passing before. Will play around
My suspicion for this one (as for the others) is that it involves a model trained under the old system that can't be imported in the new one.
I am largely basing my opinion on the comment Tests the evaluation of already trained models
.
I've been having a look at this with Douglas today and we're suspicous of the export/import process not keeping the directory structure correctly.
the keras_metadata.pb saved_model.pb variables.data-00000-of-00001 variables.index are not being nested properly in the zip within the model.ckpt folder on creation - I'm currently suspicous of this chunk of code: https://github.com/moodle/moodle/blob/master/analytics/classes/model_config.php#L92
I'm finishing up for the day here, but will try to look closer tomorrow.
I'veI've been having a look at this with Douglas today and we're suspicous of the export/import process not keeping the directory structure correctly.g been having a look at this with Douglas today and we're suspicous of the export/import process not keeping the directory structure correctly.
It was a issue on the Python side, fixed in the latest commit. I think this would have affect the web interface only, not the local API.
Thanks for update @douglasbagnall and @danmarsden ,
Not sure if export/import is what causing the failures - I mean if process would be broken it would be always broken so we would see same number of tests failing all the time. And yeah - I see latest comment that it was on python side and only affected web interface.
So with latest of version of code still same 4 tests failing (well up to 4). I see 3 of them passing from time to time. And 1 always failing, but on different steps. I.e. sometimes it is this failure (failing asserting status):
3) core_analytics_prediction_testcase::test_ml_evaluation_trained_model with data set "case-\mlbackend_python\processor" ('\mlbackend_python\processor', null) Failed asserting that 4 matches expected 0.
/home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:688 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80 phpvfscomposer:///home/ilyatregubov/moodles/integration_master/moodle/vendor/phpunit/phpunit/phpunit:97
And sometimes it is this failures (asserting a score):
3) core_analytics_prediction_testcase::test_ml_evaluation_trained_model with data set "case-\mlbackend_python\processor" ('\mlbackend_python\processor', null) Failed asserting that 0.73333333333333328 matches expected 1.
/home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:689 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80 phpvfscomposer:///home/ilyatregubov/moodles/integration_master/moodle/vendor/phpunit/phpunit/phpunit:97
Thanks Ilya- yeah with Douglas's latest code I'm now seeing the same errors as you. Hoping to look into them further tomorrow.
Thanks Dan!
There are 16 new commits, but only 961c73c70108ba8bdc8efceb45196662a81ba5ee really matters.
The others are mostly test utils for debugging and replaying commits, and little test/debug backports from a branch we run in production.
The issue is the mlbackend was not really expecting tiny datasets, and was trying to treat them with care, so as not to overfit the data. The moodle test datasets are really tiny. For example, the multiclass dataset has 3 unique rows. The tests repeat these these rows multiple times (using "number of courses") but don't add any information; it is still 3 data points.
From a data science perspective, it is wise not to overtrain on these data, butfor testing we need to, and probably, in the real world, people want the machine to give them a definite answer even if it is over dubious value. So I increased the training time for small datasets. Most of the time it is enough to get 100% categorical accuracy in the multiclass test (i.e. the highest scoring category is the correct category).
This is probably the same issue with the non-multiclass ones, but I haven't actually tested those. There is a new test for a small binary dataset in the gitlab CI, but it doesn't exactly replicate the Moodle ones (unless by astonishing coincidence).
The others are mostly test utils for debugging and replaying commits, and little test/debug backports from a branch we run in production.
So, I was meaning to add, I am happy to leave these out, but on the other hand, it is safe to leave them in, and they are useful.
Thanks @douglasbagnall and @danmarsden,
I am checking latest code and I had several successful runs. Yay! But I am still having one test failing intermittently - test_ml_multi_classifier:
There was 1 failure:
1) core_analytics_prediction_testcase::test_ml_multi_classifier with data set "notimesplitting-\mlbackend_python\processor" ('\core\analytics\time_splittin...itting', '\mlbackend_python\processor', null) Failed asserting that '2' matches expected 0.
/home/ilyatregubov/moodles/integration_master/moodle/analytics/tests/prediction_test.php:578 /home/ilyatregubov/moodles/integration_master/moodle/lib/phpunit/classes/advanced_testcase.php:80 phpvfscomposer:///home/ilyatregubov/moodles/integration_master/moodle/vendor/phpunit/phpunit/phpunit:97
FAILURES! Tests: 2, Assertions: 6, Failures: 1.
I am running just test_ml_multi_classifier and I can reproduce the failure
Thanks again for your efforts. Getting really close!
Latest commits add a test that replays the multiclass test 10 times (as a /evaluation request), and asserts it is 100% right every time. Then there are tweaks that make the network different (simpler) for datasets that have very few unique training examples.
It passed locally with the number set to 100, and seems to have passed again in the github CI run.
the lingering doubts are: is it really the same test? and do the changes break something else?
do the changes break something else?
OK, it seems this might have happened. I'll look further.
thansk @douglasbagnall I've run this several times here and no failures yet - @ilyatregubov would be good if you could run it too and let us know if you're getting any intermittent failures.
Thanks! Will run tests bunch of times soon!
Hi all,
All unit tests are passing \o/ Now I am gonna run prediction models on big database to check accuracy just in case. Things are looking good! Thanks for your efforts
Accuracy seems fine too!
Will start on packaging a new version tomorrow morning, then create new docker images and test those
Hi @douglasbagnall and @danmarsden ,
Merged. New docker images are available at https://hub.docker.com/r/moodlehq/moodle-mlbackend-python/tags. Thanks again for fixing this!!!
Awesome work, @douglasbagnall , @ilyatregubov and @danmarsden ! Thanks!
This is to address issue #49.
We change the import/export format to capture all the information needed to successfully import and export models, and to follow industry conventions more closely (the core model is saved in the TensorFlow SavedModel format, and a few bits of moodle-mlbackend metadata are saved in a Python pickle).
The import and export are tested with a round-trip, and an imported model is tested with a prediction run.
These tests do not exactly match the Moodle PHP tests which I have not run. They do more in regards to testing the accuracy of the models, but they might also miss something.