Open gdn0101 opened 3 years ago
What other tools?
What other tools?
For instance the dupmitblue+ extension from the chrome web store. On some cases I found a difference of -10 of about 800 total. In others of -55 / about 83 total.
The difference would probably be that dumpitblue+ uses the desktop version of facebook (facebook.com), whereas this scraper uses the mobile version (m.facebook.com)
Just took a closer look. In your code you use the data-sigil='undoable-action' and h3 class as an identifier for the elements. When inspecting the friends list I found out that the number of friends returned by the mobile variant is correct but the identifier only finds a subset of the results. The undoable action identifier ommits the elements that don't have the "add friend" button, and the h3 identifier doesn't find all the elements. Maybe it would de be a good idea to try with the "_5pxa" div class and funnel down with a "_5pxc" div class ?
FWIW, I have fixed this problem in this way, but my python is terrible and maybe this is a hack, so not proposing it as a merge request. Anyway, rather than looking for undoable-action,
I look for div[class="timeline"]
and descend 2 levels of divs from there, then filter/vet what's found.
diff --git a/facebook_scraper/facebook_scraper.py b/facebook_scraper/facebook_scraper.py
index 6f07834..d1d1573 100644
--- a/facebook_scraper/facebook_scraper.py
+++ b/facebook_scraper/facebook_scraper.py
@@ -154,18 +154,30 @@ class FacebookScraper:
while friend_url:
logger.debug(f"Requesting page from: {friend_url}")
response = self.get(friend_url)
- elems = response.html.find('div[data-sigil="undoable-action"]')
+ elems = response.html.find('div[class="timeline"] > div > div')
logger.debug(f"Found {len(elems)} friends")
for elem in elems:
name = elem.find("h3>a", first=True)
- tagline = elem.find("div.notice.ellipsis", first=True).text
+ if not name:
+ continue
+ # Tagline
+ tagline = elem.find("span.fcg", first=True)
+ if tagline:
+ tagline = tagline.text
+ else:
+ tagline = ""
+ # Profile Picture
profile_picture = elem.find("i.profpic", first=True).attrs.get("style")
match = re.search(r"url\('(.+)'\)", profile_picture)
if match:
profile_picture = utils.decode_css_url(match.groups()[0])
- user_id = json.loads(
- elem.find("a.touchable[data-store]", first=True).attrs["data-store"]
- ).get("id")
+ # User ID if present, not present if no "add friend"
+ user_id= elem.find("a.touchable[data-store]", first=True)
+ if user_id:
+ user_id = json.loads(user_id.attrs["data-store"]).get("id")
+ else:
+ user_id = ""
+
friend = {
"id": user_id,
"link": name.attrs.get("href"),
As a side note, adding a random/long-ish sleep before getting more friends seems like it helps with the temp ban / throttling. This does slow it down a lot though. But not as much as a temp ban does.
more = re.search(r'm_more_friends",href:"([^"]+)"', response.text)
if more:
+ time.sleep(randrange(100)/10)
friend_url = utils.urljoin(FB_MOBILE_BASE_URL, more.group(1))
This looks good to me, I would have accepted that as a pull request. I would have preferred a pull request than a git patch, but I've committed that and attributed you in https://github.com/kevinzg/facebook-scraper/commit/7aaf33d0d5adfd3ab8c356963e7f9c4b7433fc25, plus one minor tweak in https://github.com/kevinzg/facebook-scraper/commit/8fb79bed8bd5b2696d7d618f66d03030f9d713c2.
With adding sleeps, the length of the sleep should be configurable by the user. For larger friend extraction jobs, users can iterate through the get_friends generator, and sleep to their preference. Like so: https://github.com/kevinzg/facebook-scraper/issues/382#issuecomment-874369929
Ok, thanks! Having just wandered into this project I really didn't know if it was fit for a pull req or just a rough hack. Oh, and the h1 thing is interesting. I may have stumbled across that as well but wasn't sure.
Comparing the number of friends per given profile with other tools, It seems that the script is outputting only a fraction of the results.