kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.29k stars 616 forks source link

Add author_id and use a cookies files better than using login&password #115

Closed widedM closed 3 years ago

widedM commented 3 years ago

In fact its not an issue but i don't know where to write it. I added new features on your library, i extracted the author_id and i used a cookies file to login (its better than using login & password). My changes was on a version before this version (it was on march but i forgot the version number). Now you have added a better version but i don't have the time to modify it again, so can you please do it ? i will send my contribution. I really need it for my graduation project. Thanks

kevinzg commented 3 years ago

Using a cookie file sounds good. If you upload your code to github and I can take a look.

widedM commented 3 years ago

Hi, I put the code on a txt file. Here it is, I hope it helps !

# ...
def _get_posts(path, pages=10, timeout=5, sleep=0, credentials=None,cookies_file=''):

    if cookies_file != '':
        with open(cookies_file) as f:
            data = json.load(f)
        cookie0 = ""
        for ii in data:
            for key, value in ii.items():
                if key == 'name' :
                    n = value
                if key == 'value' :
                    v = value
            cookie0 = cookie0 +n+"="+v+";"
        _cookie = (cookie0)
        _headers = {'User-Agent': _user_agent, 'Accept-Language': 'en-US,en;q=0.5', 'cookie': _cookie}
    else:
        _cookie = ('locale=en_US;')
        _headers = {'User-Agent': _user_agent, 'Accept-Language': 'en-US,en;q=0.5', 'cookie': _cookie}

# ...

def _id_user(article):
    try:
        data_ft = json.loads(article.attrs['data-ft'])
        return data_ft['content_owner_id_new']
    except (KeyError, ValueError):
        return None
kevinzg commented 3 years ago

I edited your comment to show only the relevant code.

For the cookie file I think it'd be better to support the standard (?) cookie file format with the Python http.cookiejar module.

And I added the user id extraction to the current version in master.

widedM commented 3 years ago

Thats great. So now can i get the user id and use cookies file to extract privat groups using your library ? Should i upgrade ?

On Thu, 17 Sep 2020, 00:17 Kevin Zúñiga, notifications@github.com wrote:

I edited your comment to show only the relevant code.

For the cookie file I think it'd be better to support the standard (?) cookie file format https://curl.haxx.se/docs/http-cookies.html with the Python http.cookiejar https://docs.python.org/3.8/library/http.cookiejar.html#module-http.cookiejar module.

And I added the user id extraction to the current version in master.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kevinzg/facebook-scraper/issues/115#issuecomment-693716182, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMMFN4WGIZ73N3ZEFAVSDWLSGFBPVANCNFSM4RGYKHSA .

whatneuron commented 3 years ago

That would be awesome if you guys can do that

kevinzg commented 3 years ago

Cookies support has been added. See this comment https://github.com/kevinzg/facebook-scraper/issues/28#issuecomment-793066983