Open jmcrey opened 5 years ago
@jmcrey
If you can go to my build_frequency_table
function, you would see why I am not doing the prepare_text
for the input, I've left a comment in there.
Please look at the conditions at the beginning of the function. With that being said, do you still think I should do the changes above? Strings inputs (raw inputs), will get "prepared", and list corpus inputs mean they already have been prepared, and so we skip that step.
@yalqaoud
Right, so I think that the work-around in the build_frequency_table
function is done quite well. It basically adds a fault tolerance feature and allows the function to accept multiple inputs. For the purposes of this project, I think that is perfectly fine and it allowed the project to be successfully executed. However, this is not always wanted.
In my opinion, if a function is not getting its expected input, then it should fail. There are actually many reasons for this, but the two most prominent that come to mind are: 1) separation of functionality and 2) it is a best practice. First, if we have build_frequency_table
handle both the preparation and the building, then it is actually doing two things (namely, preparing the text and building the table). This ignores the practice of having a function do one thing and one thing only. Instead, we should have a separate function use prepare_text
on the raw input, then pass that output to the build_frequency_table
. If it is not passed this way, then we should simply have build_frequency_table
fail because it is not meant to handle both preparation and building. Similarly, if prepare_text
is not being passed a string, then it should fail because it is not being used properly.
Second, it is a best practice for a function to fail if it gets something unexpected. Again, this is for many reasons, but one of the most prominent reasons is that our function simply does not know how to handle the data. If unexpected data is passed to a function and our function executes anyway, this can lead to unexpected behavior and could break our application. This is actually one reason why many languages are strongly type-casted (e.g. in Java, you must declare the type of a variable, the type of an input, etc.). In fact, Python is one of the only language that is not strongly type-casted. But, I digress.
In any case, this is a long explanation for essentially saying: I still believe it would be better to pass the self.prepared_text
to build_frequency_table
because it ensures separation of functionality and highlights best practices. However, for the purposes of this project, I think the work-around is completely fine and the implementation was clever.
Please let me know if you have any more questions!
This is really close to being 100% correct. The only thing we have to change is to use the
build_frequency_table
function on theprepared_text
variable:self.word_count_frequency = build_frequency_table(self.prepared_text)
Otherwise, really great job!