ruipgil / scraperjs

A complete and versatile web scraper.
MIT License
3.7k stars 188 forks source link

Is it possible to run scraperjs through php cron? #68

Open goranefbl opened 7 years ago

goranefbl commented 7 years ago

So i wrote a script thats working great on node, but I am looking into pushing it to a hosting where backend is PHP. Every day, cron would run a scraper script and get some data.

Can you point me to a direction where i could get more info on how to do this, and if its even possible.

Thanks for great script.

ruipgil commented 7 years ago

You'll need (at least) node to run scraperjs. If you wish to use the dynamic scraper you'll also need to have access to phantomjs.  Cheers

On Fri, Jan 27, 2017 at 7:50 PM +0000, "Goran Jakovljevic" notifications@github.com wrote:

So i wrote a script thats working great on node, but I am looking into pushing it to a hosting where backend is PHP. Every day, cron would run a scraper script and get some data.

Can you point me to a direction where i could get more info on how to do this, and if its even possible.

Thanks for great script.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

rafasashi commented 6 years ago

Hello ruipgil,

I am also interested in running the dynamic scraper from php in production (for different urls and different users).

I could use exec() to start it with a command line but I have 2 problems from there:

1 - send the data in real time to a url containing Jquery (http://exqmple.com/my_widget.js) 2 - handle the end of the process once the user leave my_widget

You mentioned "have access to phantomjs" in your previous answer, what do you mean by that?

Can I do [1] and [2] with scraperjs only or do I need another script like socket?

Thanks in advance!

rafasashi commented 6 years ago

Hello again,

I have tried multiple things including detaching the process with screencommand but now I am considering using a chat with socket.io here and run the dynamic scraper inside. This will allow me to easily handle the end of the process i guess...

When the chat session is over the dynamic scraper should be ended.

For the moment it is just on paper, do you think it is a better approach?

How can I end the dynamic scraper?