scrapfly / scrapfly-scrapers

Web scrapers for popular targets powered Scrapfly.io
https://scrapfly.io
Other
169 stars 46 forks source link

Twitter: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) #8

Closed kacperduras closed 7 months ago

kacperduras commented 8 months ago

Scraper Twitter Environment Python version: 3.12.0 Scrapfly SDK version: 0.8.10 Operating System: macOS Sonoma 14.01 Describe the bug

  1. scrape_profile(): wrong wait_for_selector and timeout too short; users should be filled - I fixed like that:
async def _scrape_twitter_app(url: str, **scrape_config) -> ScrapeApiResponse:
    return await SCRAPFLY.async_scrape(ScrapeConfig(url, auto_scroll=True, **scrape_config, **BASE_CONFIG))

(...)

await _scrape_twitter_app(url, timeout=90000, retry=False, wait_for_selector="[data-testid""='primaryColumn']")
  1. scrape_profile(): tweets should be filled, without any error

  2. scrape_topic(): doesn't work, login bypass not implemented (redirect to auth page)

Received Output

scrape_profile()::

Traceback (most recent call last):
  File "/Users/kacperduras/PycharmProjects/pythonProject1/main.py", line 21, in <module>
    asyncio.run(run())
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 664, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/kacperduras/PycharmProjects/pythonProject1/main.py", line 16, in run
    profile = await twitter.scrape_profile(url)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kacperduras/PycharmProjects/pythonProject1/twitter.py", line 94, in scrape_profile
    data = json.loads(xhr["response"]["body"])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Expected Output N/A Screenshots N/A Additional context N/A

Granitosaurus commented 8 months ago

Hey, thanks for the report. We didn't have the chance the catch up the Twitter scraper with the changes that restructured the website (now it's X.com). Will let you know once update is up.

kacperduras commented 7 months ago

Hi @Granitosaurus, any update?

mazen-r commented 7 months ago

Hey @kacperduras, we have updated our Twitter scraper.

kacperduras commented 7 months ago

Hey @mazen-r - thank you, it partially works.

FYI: in some scenarios you can track a redirect from x.com to twitter.com, so you have to increase timeout to maximum period available in Scrapfly. Apart the issue, everything is fine.