Murder Mystery Summary & Rubric Score

karl-project-review commented 6 years ago

Rubric Score

Criteria 1: Valid Python Code

Score Level: 4/4 (Exceeds Expectations)
Comment(s): All of your code is valid and runs without issue.
Criteria 2: Implementation of Project Requirements
Score Level: 4/4 (Exceeds Expectations)
Comment(s): You implemented all of the necessary functions and the TextSample class.
Criteria 3: Software Architecture
Score Level: 4/4 (Exceeds Expectations)
Comment(s): Your code is separated into functions appropriately. Ordinarily having build_frequency_table be a wrapper around the call to Counter like this would seem unnecessary, as we could just call Counter directly from our SampleText constructor, but since you were adhering to the project specifications by having the build_frequency_table function, this case seems like a reasonable exception.
Criteria 4: Uses Python Language Features
Score Level: 4/4 (Exceeds Expectations)
Comment(s): Your code consistently uses Python Language Features where appropriate. Using the Counter function to do all the work of the build_frequency_table function was a particularly nice touch.
Criteria 5: Produces Accurate Output
Score Level: 4/4 (Exceeds Expectations)
Comment(s): Your output is accurate, and you either helped bring the true killer to justice or helped a particularly clever murderer to frame one of the other contestants.
Overall Score: 20/20 (Exceeds Expectations)

Great work! Your code works just like it should. With regards to your comment about the ordering of the cells in the Jupyter Notebook making this impossible to run properly, keep in mind that in Jupyter Notebooks we can individually select and run cells one at a time in whatever order we like, regardless of what order they appear in the notebook. Thus, if we first run the cell that initializes the strings of text, then run each of the function definitions and the TextSample class definition, and then run the cell that instantiates the TextSample objects for each note, before finally running the calls to find_text_similarity at the end, it will work fine (although this can be a little tedious). Sorry for the confusion on that! If you are interested in an added challenge, think about how we might modify the get_average_sentence_length function to be able to account for periods used after initials (eg if the period after the T in Gregg T. Fishy was not manually removed, how could we have our function detect that and not split the sentence?). Again, nice job with this!

r0b50n commented 6 years ago

Hello, thank you for your review! I changed the get_average_sentence_length() to:

def get_average_sentence_length(text):
    # Avoiding .split for 2-letter words beginning with capital letter and followed by '.'.
    text_split_by_words = text.split(" ")
    for word in text_split_by_words:
      if (word.istitle()) and (len(word) == 2) and (word[1] == '.'):
        text_split_by_words[text_split_by_words.index(word)] = word[0] + '^'
    text = " ".join(text_split_by_words)

    # Back to the whole text.
    text = text.replace('?', '.').replace('!', '.')

    text_split_by_sentence = text.strip().split(".")
    sentence_lengths = []
    for sentence in text_split_by_sentence:
        sentence_words = sentence.split()
        # discounting empty lists after .split()
        if not sentence_words:
            continue
        sentence_lengths.append(len(sentence_words))
    total_length = 0
    print(text_split_by_sentence)
    for length in sentence_lengths:
        total_length += length
    return total_length / len(sentence_lengths)

Do you see any downsides for this implementation? I tried to find a character which wouldn't appear in any text and chose '^', because I think it's rare enough in everyday people writings. I wanted to swap it to one character, so in case anybody needs average length by # of characters it wouldn't interfere.

karl-project-review commented 6 years ago

That works pretty well -- nice work! The only potential issues I see are in handling (1) sentences ending with the word I or (2) double initials without a space in between (eg P.T. Barnum). I think the double initials issue can be handled in a similar manner to how you did this one, but dealing with the sentences end in I (or something like 'I followed the instructions to a T.' or 'I got an A.') gets more complicated. We could make special cases where we check if the preceding word is something that would suggest a letter is being used as the word rather than the initial (eg check if it is 'than' or 'and' before an 'I', 'a' before a 'T', etc), but coming up with anything close to an exhaustive list is no easy feat, and this method would still run into trouble with sentences like 'I got an A.' vs 'Do we know an A. Smith?' Realistically, I'm not sure how feasible it is to make this work right 100% of the time, which is part of what makes thinking about it a useful exercise. Again, well done on this!

r0b50n / pwp-capstones

Murder Mystery Summary & Rubric Score #1

Rubric Score

Criteria 1: Valid Python Code

Criteria 2: Implementation of Project Requirements

Criteria 3: Software Architecture

Criteria 4: Uses Python Language Features

Criteria 5: Produces Accurate Output

Overall Score: 20/20 (Exceeds Expectations)