networkdynamics / pytok

A web scraper for TikTok using Playwright
59 stars 6 forks source link

unable to solve captcha automatically anymore #13

Open michaelcyshield opened 1 month ago

michaelcyshield commented 1 month ago

it seems that something has change in tiktoks html, the code is no longer able to drag and solve the captcha

I get this error when using this basic code:

    users = [username]
    async with PyTok(manual_captcha_solves=False, log_captcha_solves=True) as api:
        for username in users:
            user = api.user(username=username)
            user_data = await user.info()
            return user_data["avatarLarger"]

Error:

`Failed to get user info with error: Locator.bounding_box: Timeout 30000ms exceeded. Call log: waiting for locator("div.secsdk-captcha-drag-icon").first , trying requests Traceback (most recent call last): File "/home/michael/github_repos/gender_classifier/data_preprocessing/scrape_web_pics.py", line 580, in print(asyncio.run(scrape_tiktok_user_image_url("xxx"))) File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() in scrape_tiktok_user_image_url user_data = await user.info() user.py", line 137, in info_full tag_contents = extract_tag_contents(html_body) File "/home/michael/github_repos/scrape_web_pics/.venv/lib/python3.10/site-packages/pytok/helpers.py", line 37, in extract_tag_contents raise NotAvailableException("Could not find the tag contents") pytok.exceptions.NotAvailableException: Could not find the tag contents

Process finished with exit code 1 `

the issue is resolved when I set manual captcha solving to true and solve the captchas manually... so this isolates the issue to captcha solving

bendavidsteel commented 1 month ago

Yep! I'm aware, working on a fix when I have time, but no guarantees for when I'll get it working. It seems that even using 'realistic' mouse movement solutions don't successfully pass the captcha, so I need to experiment with other methods. Feel free to help if you have the time!