thefakequake / pypartpicker

A Python package that can be used to fetch information from PCPartPicker on products and parts lists.
MIT License
19 stars 9 forks source link

Fetching list fails because of Cloudflare check #10

Closed benthetechguy closed 2 years ago

benthetechguy commented 2 years ago

I'm developing a Discord bot and used this library to post an embed with the details of a PCPP list whenever the link for one is posted by a user. Here's my code:

import discord
from discord.ext import commands
from pypartpicker import Scraper, get_list_links

@commands.Cog.listener()
async def on_message(self, message):
    if len(get_list_links(message.content)) >= 1:
        pcpp = Scraper()
        link = get_list_links(message.content)[0]
        list = pcpp.fetch_list(link)

        description = ""
        for part in list.parts:
            description = description + f"**{part.type}:** {part.name} **({part.price})**\n"
        description = description + f"\n**Estimated Wattage:** {list.wattage}\n**Price:** {list.total}"

        embed = discord.Embed(title="PCPartPicker List", url=link, description=description, color=0x00a0a0)
        await message.channel.send(embed=embed)

It works perfectly for me, here's the result: pcpp.png The only problem is, the bot is hosted by my friend @Philipp-spec in Germany, and to view PCPartPicker lists he needs to go through a Cloudflare check first. As a result, the bot gives this error whenever it tries to scrape a page:

Ignoring exception in on_message
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/discord/client.py", line 343, in _run_event
    await coro(*args, **kwargs)
  File "/home/tsc/TSCBot-py/cogs/listeners.py", line 10, in on_message
    list = pcpp.fetch_list(link)
  File "/usr/lib/python3.9/site-packages/pypartpicker/scraper.py", line 106, in fetch_list
    soup = self.__make_soup(list_url)
  File "/usr/lib/python3.9/site-packages/pypartpicker/scraper.py", line 84, in __make_soup
    if "Verification" in soup.find(class_="pageTitle").get_text():
AttributeError: 'NoneType' object has no attribute 'get_text'

Is there any way to fix this? The whole point of the Cloudflare check is to make sure you're not a bot… Best way to reproduce is with a VPN, though sometimes it doesn't give you the check.

thefakequake commented 2 years ago

I got around this by using the same request headers as your browser does. Try fetching it with your browser after completing necessary CAPTCHAs and using the "Network" panel to see request headers. You can set headers in the Scraper constructor with the headers kwarg.

If you need more help let me know.

benthetechguy commented 2 years ago

These are the correct headers to copy, right? headers How do I put this into the Scraper? As a dict of the options? Would pcpp = Scraper(headers={sec-ch-ua-mobile: "?0", sec-ch-ua-platform: "Linux", Upgrade-Insecure-Requests: 1, User-Agent: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"}) be right? (I'm skipping the sec-ch-ua header because I don't even know where to start with entering that)

thefakequake commented 2 years ago

The one you want is "cookie". Don't send it here, its confidential.

benthetechguy commented 2 years ago

So pcpp = Scraper(headers=x), x being whatever the value for cookie is?

thefakequake commented 2 years ago

image


pcpp = Scraper(headers={
    "cookie": "the value of the header"
})```
benthetechguy commented 2 years ago

Perfect, thanks. Does this cookie ever expire? Does it only work for one list or does it allow access to all of PCPP?

thefakequake commented 2 years ago

I believe it expires every year? If you look at response headers then there's a Max-Age field in the set-cookie header: image 31449600 seconds = basically a year

thefakequake commented 2 years ago

So you might need to update it every now and then but should be fine.

benthetechguy commented 2 years ago

Okay, perfect. I'll get him to test it, and if it works I'll close the issue. Thanks for the quick and helpful response!

thefakequake commented 2 years ago

No problem!

thefakequake commented 2 years ago

All good?

benthetechguy commented 2 years ago

I'm sorry, I forgot all about this issue. It worked.

thefakequake commented 2 years ago

nice