The extra precision of 64 bit floats is not useful for machine learning, and this is especially true for Moodle's use, since most of the inputs are effectively one bit, the networks are small and shallow, and the functions being learned are quite vague.
As a result of 64 bit being generally unused, the 64 bit routines are much less well optimised than the 32 bit ones, and combined with the inherent throughput and caching benefits, we see 32 bit training being many times faster than 64 bit.
The other significant change here is to report actual probabilities rather than the logit values which are not really meaningful.
The extra precision of 64 bit floats is not useful for machine learning, and this is especially true for Moodle's use, since most of the inputs are effectively one bit, the networks are small and shallow, and the functions being learned are quite vague.
As a result of 64 bit being generally unused, the 64 bit routines are much less well optimised than the 32 bit ones, and combined with the inherent throughput and caching benefits, we see 32 bit training being many times faster than 64 bit.
The other significant change here is to report actual probabilities rather than the logit values which are not really meaningful.