r0b50n / pwp-capstones

0 stars 0 forks source link

Murder Mystery Summary & Rubric Score #1

Open karl-project-review opened 6 years ago

karl-project-review commented 6 years ago

Rubric Score

Criteria 1: Valid Python Code

r0b50n commented 6 years ago

Hello, thank you for your review! I changed the get_average_sentence_length() to:

def get_average_sentence_length(text):
    # Avoiding .split for 2-letter words beginning with capital letter and followed by '.'.
    text_split_by_words = text.split(" ")
    for word in text_split_by_words:
      if (word.istitle()) and (len(word) == 2) and (word[1] == '.'):
        text_split_by_words[text_split_by_words.index(word)] = word[0] + '^'
    text = " ".join(text_split_by_words)

    # Back to the whole text.
    text = text.replace('?', '.').replace('!', '.')

    text_split_by_sentence = text.strip().split(".")
    sentence_lengths = []
    for sentence in text_split_by_sentence:
        sentence_words = sentence.split()
        # discounting empty lists after .split()
        if not sentence_words:
            continue
        sentence_lengths.append(len(sentence_words))
    total_length = 0
    print(text_split_by_sentence)
    for length in sentence_lengths:
        total_length += length
    return total_length / len(sentence_lengths)

Do you see any downsides for this implementation? I tried to find a character which wouldn't appear in any text and chose '^', because I think it's rare enough in everyday people writings. I wanted to swap it to one character, so in case anybody needs average length by # of characters it wouldn't interfere.

karl-project-review commented 6 years ago

That works pretty well -- nice work! The only potential issues I see are in handling (1) sentences ending with the word I or (2) double initials without a space in between (eg P.T. Barnum). I think the double initials issue can be handled in a similar manner to how you did this one, but dealing with the sentences end in I (or something like 'I followed the instructions to a T.' or 'I got an A.') gets more complicated. We could make special cases where we check if the preceding word is something that would suggest a letter is being used as the word rather than the initial (eg check if it is 'than' or 'and' before an 'I', 'a' before a 'T', etc), but coming up with anything close to an exhaustive list is no easy feat, and this method would still run into trouble with sentences like 'I got an A.' vs 'Do we know an A. Smith?' Realistically, I'm not sure how feasible it is to make this work right 100% of the time, which is part of what makes thinking about it a useful exercise. Again, well done on this!