Open karl-project-review opened 6 years ago
Hello, thank you for your review!
I changed the get_average_sentence_length()
to:
def get_average_sentence_length(text):
# Avoiding .split for 2-letter words beginning with capital letter and followed by '.'.
text_split_by_words = text.split(" ")
for word in text_split_by_words:
if (word.istitle()) and (len(word) == 2) and (word[1] == '.'):
text_split_by_words[text_split_by_words.index(word)] = word[0] + '^'
text = " ".join(text_split_by_words)
# Back to the whole text.
text = text.replace('?', '.').replace('!', '.')
text_split_by_sentence = text.strip().split(".")
sentence_lengths = []
for sentence in text_split_by_sentence:
sentence_words = sentence.split()
# discounting empty lists after .split()
if not sentence_words:
continue
sentence_lengths.append(len(sentence_words))
total_length = 0
print(text_split_by_sentence)
for length in sentence_lengths:
total_length += length
return total_length / len(sentence_lengths)
Do you see any downsides for this implementation? I tried to find a character which wouldn't appear in any text and chose '^', because I think it's rare enough in everyday people writings. I wanted to swap it to one character, so in case anybody needs average length by # of characters it wouldn't interfere.
That works pretty well -- nice work! The only potential issues I see are in handling (1) sentences ending with the word I or (2) double initials without a space in between (eg P.T. Barnum). I think the double initials issue can be handled in a similar manner to how you did this one, but dealing with the sentences end in I (or something like 'I followed the instructions to a T.' or 'I got an A.') gets more complicated. We could make special cases where we check if the preceding word is something that would suggest a letter is being used as the word rather than the initial (eg check if it is 'than' or 'and' before an 'I', 'a' before a 'T', etc), but coming up with anything close to an exhaustive list is no easy feat, and this method would still run into trouble with sentences like 'I got an A.' vs 'Do we know an A. Smith?' Realistically, I'm not sure how feasible it is to make this work right 100% of the time, which is part of what makes thinking about it a useful exercise. Again, well done on this!
Rubric Score
Criteria 1: Valid Python Code
Criteria 2: Implementation of Project Requirements
Criteria 3: Software Architecture
Criteria 4: Uses Python Language Features
Criteria 5: Produces Accurate Output
Overall Score: 20/20 (Exceeds Expectations)
Great work! Your code works just like it should. With regards to your comment about the ordering of the cells in the Jupyter Notebook making this impossible to run properly, keep in mind that in Jupyter Notebooks we can individually select and run cells one at a time in whatever order we like, regardless of what order they appear in the notebook. Thus, if we first run the cell that initializes the strings of text, then run each of the function definitions and the TextSample class definition, and then run the cell that instantiates the TextSample objects for each note, before finally running the calls to find_text_similarity at the end, it will work fine (although this can be a little tedious). Sorry for the confusion on that! If you are interested in an added challenge, think about how we might modify the get_average_sentence_length function to be able to account for periods used after initials (eg if the period after the T in Gregg T. Fishy was not manually removed, how could we have our function detect that and not split the sentence?). Again, nice job with this!