medialab / minet

A webmining CLI tool & library for python.
GNU General Public License v3.0
273 stars 26 forks source link

GH actions + Minet Scrap Twitter fail. #382

Closed stefw closed 2 years ago

stefw commented 2 years ago

hi,

i have this GH action to generate a twitter scrap csv (written by @taniki) :

name: scrape bfm

on:
  workflow_dispatch:
  schedule:
    - cron:  '0 9 * * *'

jobs:
  scrape_bfm:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
        with:
          python-version: '3.x'
      - name: install minet
        run: |
          python -m pip install --upgrade pip
          pip install minet==0.56.2
      - name: scrape @BFMTV tweets
        shell: bash
        run: |
          minet tw scrape tweets "from:@BFMTV since:2021-09-01" > bfmtv-tweets.csv
      - name: commit
        uses: ./.github/actions/commit
        with:
          message: lol @bfmtv

Sometimes, no problem. Sometimes, GH return error log :

Run minet tw scrape tweets "from:@CNEWS since:2021-09-01" > cnews-tweets.csv
Collecting tweets: 0 tweets [00:00, ? tweets/s]                            
Collecting tweets: 0 tweets [00:00, ? tweets/s]                   
Searching for "from:@CNEWS since:2021-09-01"

Collecting tweets: 0 tweets [00:00, ? tweets/s]
Collecting tweets: 0 tweets [00:00, ? tweets/s, queries=1, tokens=1]Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.1/x64/bin/minet", line 8, in <module>
    sys.exit(main())
  File "/opt/hostedtoolcache/Python/3.10.1/x64/lib/python3.10/site-packages/minet/cli/__main__.py", line 218, in main
    fn(cli_args)
  File "/opt/hostedtoolcache/Python/3.10.1/x64/lib/python3.10/site-packages/minet/cli/twitter/__init__.py", line 31, in twitter_action
    twitter_scrape_action(cli_args)
  File "/opt/hostedtoolcache/Python/3.10.1/x64/lib/python3.10/site-packages/minet/cli/twitter/scrape.py", line 69, in twitter_scrape_action
    for tweet, meta in iterator:
  File "/opt/hostedtoolcache/Python/3.10.1/x64/lib/python3.10/site-packages/minet/twitter/api_scraper.py", line 370, in search
    new_cursor, tweets = retryer(self.request_search, query, cursor, refs=refs)
  File "/opt/hostedtoolcache/Python/3.10.1/x64/lib/python3.10/site-packages/tenacity/__init__.py", line 404, in __call__
    do = self.iter(retry_state=retry_state)
  File "/opt/hostedtoolcache/Python/3.10.1/x64/lib/python3.10/site-packages/tenacity/__init__.py", line 349, in iter
    return fut.result()
  File "/opt/hostedtoolcache/Python/3.10.1/x64/lib/python3.10/concurrent/futures/_base.py", line 438, in result
    return self.__get_result()
  File "/opt/hostedtoolcache/Python/3.10.1/x64/lib/python3.10/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/opt/hostedtoolcache/Python/3.10.1/x64/lib/python3.10/site-packages/tenacity/__init__.py", line 407, in __call__
    result = fn(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.1/x64/lib/python3.10/site-packages/minet/twitter/api_scraper.py", line 72, in wrapped
    self.acquire_guest_token()
  File "/opt/hostedtoolcache/Python/3.10.1/x64/lib/python3.10/site-packages/minet/twitter/api_scraper.py", line 261, in acquire_guest_token
    raise TwitterGuestTokenError
minet.twitter.exceptions.TwitterGuestTokenError

Collecting tweets: 0 tweets [00:00, ? tweets/s, queries=1, tokens=1]
Error: Process completed with exit code 1.

Dont understand. Did anyone have the same problem Twitter ban GH sometimes ?

Thanks for Minet, super outil !

Yomguithereal commented 2 years ago

Hello @stefw, @paulgirard has contributed a fix to this issue on PR #385. Would you be willing to test it on your end if possible to tell us if this fixes your issue also, so we can merge safely?

stefw commented 2 years ago

Hello @Yomguithereal Sorry but I do not know how to do to test the version with the fix My level is bad with pull / fix etc... I use PIP to setup Minet in github actions... Really sorry.

Yomguithereal commented 2 years ago

You should be able to do so likewise:

pip install git+https://github.com/paulgirard/minet.git@acquire_guest_token_by_api
stefw commented 2 years ago

@Yomguithereal i've made a test with 4 Github Actions differents and IT WORKS. Thank so much. i was sad since it was broken :) If you want to see result : https://observablehq.com/@stefw/presidentielle-repartition-des-tweets-politiques-par-medias?collection=@stefw/twitter

cc @taniki 👍

Yomguithereal commented 2 years ago

I am happy to know it works :). @paulgirard is the real MVP here. He will run the code in a heavy production context tomorrow and if everything goes fine, we'll merge the code and release a new version of minet with the bugfix just after.

Yomguithereal commented 2 years ago

v0.56.4 is now live with the fix