talkpython / 100daysofcode-with-python-course

Course materials and handouts for #100DaysOfCode in Python course
https://training.talkpython.fm/courses/explore_100days_in_python/100-days-of-code-in-python
MIT License
2.09k stars 1.07k forks source link

Small detail on capturing all capitalized words #24

Closed akonsta closed 6 years ago

akonsta commented 6 years ago

In your code you use the regex r'[A-Z][a-z0-9]+' as the pattern, but that would not count capitalized one letter words (i.e., I or A), capitalized words with apostrophes (e.g., Don't, Isn't, O'Leary) and it would also miss words (or abbreviations that had one or more capitals (e.g., USA, STOP and McKnight). There are certainly much more complicated ways to write the pattern, but I would suggest the pattern r'[A-Z][A-Za-z0-9']*' It is not a big deal, but I thought I would mention it. This might capture unintended strings (e.g., lists with letter counters like '(A)', '(B)', etc.; variables that show up in equations like X + Y; non-word strings like UK postal codes - EC1, W8, etc.), but I am of the school of thought that I would rather have more data than less.

bbelderbos commented 6 years ago

Hey Andrew, yes that is more accurate, I will adopt that pattern in my notebook. Thanks