kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.27k stars 611 forks source link

pyqt5 deployment issues #420

Open samsebree opened 2 years ago

samsebree commented 2 years ago

Recently made an update to the latest version of facebook_scraper.

To my dismay, through a series of dependencies, it pulled down the entirety of the pyqt5 library. This is a 200mb library. It is so large that it makes it impossible to deploy my project to AWS lambda because lambda has a 250mb unzipped file limit.

Can we please make a modification to only pull down the portion of pyqt5 required by this library, in an effort to reduct the file size of the dependency tree? I would imagine there is no need to pull in the entirety of the pyqt5 library.

neon-ninja commented 2 years ago

This package doesn't depend on pyqt5 - here's the dependency tree using pipdeptree of an installation in a fresh virtualenv:

facebook-scraper==0.2.45
  - dateparser [required: >=1.0.0,<2.0.0, installed: 1.0.0]
    - python-dateutil [required: Any, installed: 2.8.2]
      - six [required: >=1.5, installed: 1.16.0]
    - pytz [required: Any, installed: 2021.1]
    - regex [required: !=2019.02.19, installed: 2021.8.3]
    - tzlocal [required: Any, installed: 2.1]
      - pytz [required: Any, installed: 2021.1]
  - demjson [required: >=2.2.4,<3.0.0, installed: 2.2.4]
  - requests-html [required: >=0.10.0,<0.11.0, installed: 0.10.0]
    - bs4 [required: Any, installed: 0.0.1]
      - beautifulsoup4 [required: Any, installed: 4.9.3]
        - soupsieve [required: >1.2, installed: 2.2.1]
    - fake-useragent [required: Any, installed: 0.1.11]
    - parse [required: Any, installed: 1.19.0]
    - pyppeteer [required: >=0.0.14, installed: 0.2.5]
      - appdirs [required: >=1.4.3,<2.0.0, installed: 1.4.4]
      - pyee [required: >=8.1.0,<9.0.0, installed: 8.1.0]
      - tqdm [required: >=4.42.1,<5.0.0, installed: 4.62.0]
      - urllib3 [required: >=1.25.8,<2.0.0, installed: 1.26.6]
      - websockets [required: >=8.1,<9.0, installed: 8.1]
    - pyquery [required: Any, installed: 1.4.3]
      - cssselect [required: >0.7.9, installed: 1.1.0]
      - lxml [required: >=2.1, installed: 4.6.3]
    - requests [required: Any, installed: 2.26.0]
      - certifi [required: >=2017.4.17, installed: 2021.5.30]
      - charset-normalizer [required: ~=2.0.0, installed: 2.0.4]
      - idna [required: >=2.5,<4, installed: 3.2]
      - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.6]
    - w3lib [required: Any, installed: 1.22.0]
      - six [required: >=1.4.1, installed: 1.16.0]

To find out how you ended up with PyQT5, you could use pipdeptree --reverse --packages PyQt5

samsebree commented 2 years ago

Thank you for the quick response.

Funny enough, pipdeptree does not say that PyQt5 is required by anything in my project. However, it is getting pulled into my .severless requirements folder when I try to deploy.

This is probably some kind of bug/misconfiguration with my serverless-python-requirements. I've opened an issue over there: https://github.com/UnitedIncome/serverless-python-requirements/issues/626

Super weird that it only seems to happen when I change the required package version of facebook-scraper in my requirements.txt file.