Closed tommylau-exe closed 4 years ago
Just realized this function isn't really testable. Takes a file path and returns a DataFrame
? Will be re-thinking this one
Just realized this function isn't really testable. Takes a file path and returns a
DataFrame
? Will be re-thinking this one
Fixed by splitting into two different functions:
This generalizes the implicit functionality present in the original function, and allows for more reusability: e.g. use with DataFrame's read from different file types (not just CSV). As a side-effect, actually reading the data is now the responsibility of the caller, but these functions can help format that data, which was the true intent all along
This function feels similar to read_calorie_data()
from above, not straightforward to test. It does a lot of stuff, mostly with the help of keras. Will try to write some test for it though
This function feels similar to
read_calorie_data()
from above, not straightforward to test. It does a lot of stuff, mostly with the help of keras. Will try to write some test for it though
Realized that this line is particularly problematic:
The keras.preprocessing.text.one_hot()
function doesn't guarantee "unicity" aka uniqueness of it's mappings since it relies on hashing. This means it's result can contain collisions, particularly on small vocabularies, as you might find in a unit test. I can't say I'll completely stop using this function (it's quite useful and encodes in O(1) time), but I'll have to refactor it out of the helper functions library to be able to properly test.
This function feels similar to
read_calorie_data()
from above, not straightforward to test. It does a lot of stuff, mostly with the help of keras. Will try to write some test for it thoughRealized that this line is particularly problematic:
The
keras.preprocessing.text.one_hot()
function doesn't guarantee "unicity" aka uniqueness of it's mappings since it relies on hashing. This means it's result can contain collisions, particularly on small vocabularies, as you might find in a unit test. I can't say I'll completely stop using this function (it's quite useful and encodes in O(1) time), but I'll have to refactor it out of the helper functions library to be able to properly test.
Fixed by replacing add_input_labels()
with this function instead:
Padding a list was an important part of the original function that's easily testable. Encoding strings to integers is now also the responsibility of the caller, that way we can maintain speed and testability
Fixes #3
Creates new file:
ml/amaranth_lib.py
. Also includes tests for these helper functions inml/test_amaranth_lib.py
.