PHP version? - Githubissues

JakeQZ commented 3 years ago

Without going too much into the background, Facebook have disabled the 'Like' button plugin (for 3rd party websites) in Europe except for users who are logged in and have consented to the relevant cookies.

After two years, Facebook has failed to come up with an alternative (such as a simple link showing the number of followers, as Twitter has).

Small businesses need to use social media to keep apace. A simple 'Like on Facebook' button showing the number of followers/likes is all they need. But Facebook has taken that away out of mindless self-interest (probably disgruntlement at the court rulings, and perhaps whilst continuing to collect data illegally). They do however provide a 'brand asset pack' which, in conjunction with your scraper, could be used to recreate the same, with the bonus of not leaking information to Facebook.

However, you've used Python, which is not so convenient to incorporate into a web application, particularly portably as a library. Would it be easy to port the Python code to PHP?

shaikhsajid1111 commented 3 years ago

Alright, I understood. To create this scraper, I have used browser automation rather than sending simple HTTP requests and fetching the data from the response because the code is injected via Javascript. I have this built with selenium framework and it does support PHP as well, here's a library that does that.

Just a side note, as you're trying to incorporate this kind of scraper, I just assume that:

You're aware of browser automation will take more than just a usual time if you're thinking to integrate it with a web application. Let say you need data for 50 posts, it won't be very fast like we get responses from APIs, it will create a new web browser instance > navigate to that page > scroll > collect data > if it gets 50 posts then it will close entire process and return the data. This whole process is time-consuming.
It will take more resources as the entire web browser will be running on your system every time you make a request for data. Like, for example, if you receive 10000 requests per minute then the assumption of web browser instances running on your system will be equivalent to 10000, they will close only after they have completed their job.

If you just want "likes" then you may take the logic from this project called snscrape

JakeQZ commented 3 years ago

Thanks for the swift and detailed response.

I hadn't realized this used a headless browser. Given it does, I don't see any point in a PHP version. The rationale for that was for something that could be self-contained within the PHP environment, and thus run on any server supporting PHP. But if it has to call out anyway (via the shell) to other components which are only likely to be installed on a server (e.g. Linux) which already has Python set up, then I don't see the point.

Of course, a PHP wrapper to simulate the 'Like' button as it should have been implemented would be possible and perhaps desirable, to avoid duplication of effort incorporating this into web pages.

You're aware of browser automation will take more than just a usual time if you're thinking to integrate it with a web application.

The results could be cached and updated periodically, rather than on every request.

I'll close this issue as being out of scope.

shaikhsajid1111 / facebook_page_scraper

PHP version? #13