Closed ProfFL028 closed 6 years ago
Hi @ProfFL028, thanks for reading :) Can you tell me what regex pattern you are using?
hi, i mean the code in page 26:
link_crawler('http://example.webscraping.com', '/(index|view)/')
it just download the main url without download any url match '/(index/view)/'. so i dig into the code,and find out that re.match
will only match the pattern begin with, while re.search
would fix the bug.
I actually had a chance to look at this (sorry for the delay). The best fix is to actually change the regex to:
(/places/default/index|/places/default/view)
. Hope that helps!
thanks for the book. when i run the code in "Link Crawler" section, and it just download "http://example.webscraping.com" only, after i dig into the code and change re.match(...) to re.search in the "if" statement, the code works out.