yalqaoud / pwp-capstones

0 stars 0 forks source link

Make sure to use build_frequency_table on the prepared text #2

Open jmcrey opened 5 years ago

jmcrey commented 5 years ago
class TextSample:
    def __init__(self, text, author):
        self.raw_text = text
        self.author = author
        self.average_sentence_length = get_average_sentence_length(self.raw_text)
        self.prepared_text = prepare_text(text)
        self.word_count_frequency = build_frequency_table(text)
        self.ngram_frequency = build_frequency_table(ngram_creator(prepare_text(text)))

This is really close to being 100% correct. The only thing we have to change is to use the build_frequency_table function on the prepared_text variable:

self.word_count_frequency = build_frequency_table(self.prepared_text)

Otherwise, really great job!

yalqaoud commented 5 years ago

@jmcrey If you can go to my build_frequency_table function, you would see why I am not doing the prepare_text for the input, I've left a comment in there. Please look at the conditions at the beginning of the function. With that being said, do you still think I should do the changes above? Strings inputs (raw inputs), will get "prepared", and list corpus inputs mean they already have been prepared, and so we skip that step.

jmcrey commented 5 years ago

@yalqaoud Right, so I think that the work-around in the build_frequency_table function is done quite well. It basically adds a fault tolerance feature and allows the function to accept multiple inputs. For the purposes of this project, I think that is perfectly fine and it allowed the project to be successfully executed. However, this is not always wanted.

In my opinion, if a function is not getting its expected input, then it should fail. There are actually many reasons for this, but the two most prominent that come to mind are: 1) separation of functionality and 2) it is a best practice. First, if we have build_frequency_table handle both the preparation and the building, then it is actually doing two things (namely, preparing the text and building the table). This ignores the practice of having a function do one thing and one thing only. Instead, we should have a separate function use prepare_text on the raw input, then pass that output to the build_frequency_table. If it is not passed this way, then we should simply have build_frequency_table fail because it is not meant to handle both preparation and building. Similarly, if prepare_text is not being passed a string, then it should fail because it is not being used properly.

Second, it is a best practice for a function to fail if it gets something unexpected. Again, this is for many reasons, but one of the most prominent reasons is that our function simply does not know how to handle the data. If unexpected data is passed to a function and our function executes anyway, this can lead to unexpected behavior and could break our application. This is actually one reason why many languages are strongly type-casted (e.g. in Java, you must declare the type of a variable, the type of an input, etc.). In fact, Python is one of the only language that is not strongly type-casted. But, I digress.

In any case, this is a long explanation for essentially saying: I still believe it would be better to pass the self.prepared_text to build_frequency_table because it ensures separation of functionality and highlights best practices. However, for the purposes of this project, I think the work-around is completely fine and the implementation was clever.

Please let me know if you have any more questions!