hedii / php-crawler

A php crawler that finds emails on the internets
MIT License
134 stars 65 forks source link

How stay over the same 1 domain ? #2

Closed allado closed 9 years ago

allado commented 9 years ago

How stay over the same 1 domain ?

hedii commented 9 years ago

it is not possible for the moment. it would be added next. pull requests are welcome

jackmcdowell commented 9 years ago

Hey Allado take a look at this other project I was working on that stays on the domain to see if you get any ideas. It is much less elegant then Hedii's project but it might give you some ideas for incorporating a "stay on domain" check-box or something like that: domainEmailCrawler

hedii commented 9 years ago

hi @allado and @jackmcdowell. check the branch https://github.com/hedii/php-crawler/tree/feature-domain-specific I think it works well, i will be happy to hear if it does :)

allado commented 9 years ago

Hi Jack ... you are the man !

Thanks !!!

2015-09-13 16:50 GMT-03:00 hedii notifications@github.com:

hi @allado https://github.com/allado and @jackmcdowell https://github.com/jackmcdowell. check the branch https://github.com/hedii/php-crawler/tree/feature-domain-specific I think it works well, i will be happy to hear if it does :)

— Reply to this email directly or view it on GitHub https://github.com/hedii/php-crawler/issues/2#issuecomment-139912619.

jackmcdowell commented 9 years ago

Thanks @allado but it was @hedii who you should be thanking!

hedii commented 9 years ago

i close, tell me if you have any issues with the domain specific branch

allado commented 8 years ago

Hi

How modify regex $pattern = ?

In Crawler.php to find urls without domain start

/comercios/item1.html /comercios/item1.html /casas/item1.html /casas/item2.html /casas/item3.html ..etc

You can modify it to find urls /xxxxx/xxxx ?

In the source code of the webpage:

Alquiler de carpas

Regards !

2015-10-10 12:13 GMT-03:00 hedii notifications@github.com:

i close, tell me if you have any issues with the domain specific branch

— Reply to this email directly or view it on GitHub https://github.com/hedii/php-crawler/issues/2#issuecomment-147096753.

hedii commented 8 years ago

Hi @allado I am working on a new version that will be able to do that. It is not possible with the current version. watch this repo to be updated when it will be released