kjam / wswp

Code for the second edition Web Scraping with Python book by Packt Publications
129 stars 98 forks source link

Link_Crawler.py #13

Open ucrengineer opened 5 years ago

ucrengineer commented 5 years ago

Whenever I try to use link_crawler.py, nothing happens. The only output is 'Downloading: http://example.webscraping.com'.

windows 10 Python 3.7.3 64 bit (AMD64)] on win32

hovey-xu commented 4 years ago

same problem as you....do you fix it?

ucrengineer commented 4 years ago

I just moved on man, I didn't know how to fix it.

On Sat, Oct 26, 2019 at 5:51 AM hovey-xu notifications@github.com wrote:

same problem as you....do you fix it?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kjam/wswp/issues/13?email_source=notifications&email_token=AMTNPS2ZM2G54AGQVU4IXI3QQQ4MXA5CNFSM4IDKGSMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECKHIWI#issuecomment-546600025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTNPS7TJPUZD6LN3N67PBTQQQ4MXANCNFSM4IDKGSMA .

mcrouse911 commented 4 years ago

me either

On Mon, Oct 28, 2019 at 4:38 PM ucrengineer notifications@github.com wrote:

I just moved on man, I didn't know how to fix it.

On Sat, Oct 26, 2019 at 5:51 AM hovey-xu notifications@github.com wrote:

same problem as you....do you fix it?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/kjam/wswp/issues/13?email_source=notifications&email_token=AMTNPS2ZM2G54AGQVU4IXI3QQQ4MXA5CNFSM4IDKGSMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECKHIWI#issuecomment-546600025 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AMTNPS7TJPUZD6LN3N67PBTQQQ4MXANCNFSM4IDKGSMA

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kjam/wswp/issues/13?email_source=notifications&email_token=ABKFO5CR5FOOOXBZNMNZIKLQQ5EWFA5CNFSM4IDKGSMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECOKBFQ#issuecomment-547135638, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKFO5CUW627E3JE5PLKJ2LQQ5EWFANCNFSM4IDKGSMA .

hovey-xu commented 4 years ago

i guess something wrong with regex: (index|view) , so i try changing it with another input: (places/default/view|places/default/index) and scripts download a few URLs shut down.

ucrengineer commented 4 years ago

hmm..interesting..

On Tue, Oct 29, 2019 at 3:01 AM hovey-xu notifications@github.com wrote:

i guess something wrong with regex: (index|view) , so i try changing it with another input: (places/default/view|places/default/index) and scripts download a few URLs shut down.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kjam/wswp/issues/13?email_source=notifications&email_token=AMTNPSZMF77XVQQFA4N5QDTQRACYBA5CNFSM4IDKGSMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECP473I#issuecomment-547344365, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTNPS2WFM4J6ZXOIAGAKZLQRACYBANCNFSM4IDKGSMA .

kjam commented 4 years ago

Hi there,

It seems the URL structure has changed since the initial publication. @hovey-xu do you want to send a PR with this change?

Best, katharine