shaikhsajid1111 / facebook_page_scraper

Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV
https://pypi.org/project/facebook-page-scraper/
MIT License
235 stars 65 forks source link

PHP version? #13

Closed JakeQZ closed 3 years ago

JakeQZ commented 3 years ago

Without going too much into the background, Facebook have disabled the 'Like' button plugin (for 3rd party websites) in Europe except for users who are logged in and have consented to the relevant cookies.

After two years, Facebook has failed to come up with an alternative (such as a simple link showing the number of followers, as Twitter has).

Small businesses need to use social media to keep apace. A simple 'Like on Facebook' button showing the number of followers/likes is all they need. But Facebook has taken that away out of mindless self-interest (probably disgruntlement at the court rulings, and perhaps whilst continuing to collect data illegally). They do however provide a 'brand asset pack' which, in conjunction with your scraper, could be used to recreate the same, with the bonus of not leaking information to Facebook.

However, you've used Python, which is not so convenient to incorporate into a web application, particularly portably as a library. Would it be easy to port the Python code to PHP?

shaikhsajid1111 commented 3 years ago

Alright, I understood. To create this scraper, I have used browser automation rather than sending simple HTTP requests and fetching the data from the response because the code is injected via Javascript. I have this built with selenium framework and it does support PHP as well, here's a library that does that.

Just a side note, as you're trying to incorporate this kind of scraper, I just assume that:

If you just want "likes" then you may take the logic from this project called snscrape

JakeQZ commented 3 years ago

Thanks for the swift and detailed response.

I hadn't realized this used a headless browser. Given it does, I don't see any point in a PHP version. The rationale for that was for something that could be self-contained within the PHP environment, and thus run on any server supporting PHP. But if it has to call out anyway (via the shell) to other components which are only likely to be installed on a server (e.g. Linux) which already has Python set up, then I don't see the point.

Of course, a PHP wrapper to simulate the 'Like' button as it should have been implemented would be possible and perhaps desirable, to avoid duplication of effort incorporating this into web pages.

You're aware of browser automation will take more than just a usual time if you're thinking to integrate it with a web application.

The results could be cached and updated periodically, rather than on every request.

I'll close this issue as being out of scope.