Open felix4webscience opened 5 years ago
@felix4webscience document is a variable that is user-defined actually.
This might help to understand it better
def get_document():
url = "http://radar.oreilly.com/2010/06/what-is-data-science.html"
html = requests.get(url).text
soup = BeautifulSoup(html, 'html5lib')
content = soup.find("div", "article-body") # find article-body div
regex = r"[\w']+|[\.]" # matches a word or a period
document = []
for paragraph in content("p"):
words = re.findall(regex, fix_unicode(paragraph.text))
document.extend(words)
return document
Use it like this
document = get_document()
Then run your code on the document variable.
Hope this helps.
Hi,
I got a problem in Chapter 2 (German version) with example about "Defaultdict" and also "Counter".
Whats seems to be left out here is, how the value "document" has been defined.
Code: _``` from collections import defaultdict word_counts = {} for word in document: if word in word_counts: word_counts[word] += 1 else: word_count[word] = 1