Problem when Running the Webscraper

LonsterMonster commented 3 years ago

First off i love how it works,

I havent used webscraping programs before and kinda new to python and the issue i am having is when i run the scarper.py it opens the windows correctly in firefox but doesnt get the games to save them

Edit 1: Console_name console generated an exception: 'NoneType' object has no attribute 'group'

the console_name is name of the console i just didnt put the entire log cause says same error for every game console

not sure if it is the program or pricecharting.com

Update 1: in scraper.py at the line browser.get('https://www.pricecharting.com/console' + console) i replaced with browser.get('https://www.pricecharting.com/' + console) and says that each console has been scraped completed but still doesnt save the values to the csv

Update 2: at the line browser.get('https://www.pricecharting.com/' + console) i replaced with browser.get('https://www.pricecharting.com/console' + console) and at title = re.search(r'>(.*?)', str(EachPart.select('td[class="title"]'))).group(1) i took out the group(1) as that was the error problem then it saved all games but without the games names

Update 3: in the def scrapeVals(console,browser): section find the for EachPart in soup.select('tr[id*="product-"]'): and add after it

        try:
            title = re.search(r'>(.*?)</a>', str(EachPart.select('td[class="title"]'))).group()
        except AttributeError:
            title = re.search(r'>(.*?)</a>', str(EachPart.select('td[class="title"]')))

to replace the title = re.search(r'>(.*?)</a>', str(EachPart.select('td[class="title"]'))).group(1)

i have it to show the name of the game but will get special characters in it like and cannot filter them if could filter thenm would be working correctly

markfoster314 commented 3 years ago

Hey! This is sick, it's been so long since I've had any interaction on this I honestly didn't think anyone would be using this repo wold just go dead.

They most likely updated the website format (it has been over a year since I worked on this script and I'm pretty sure the last iteration was pretty hard-coded to their format at the time). We can definitely update the scraper to be more generic and fault tolerant.

I'm out of town this upcoming week but once I get back, if you're still interested, I can for sure work on updating it. Also, if you wanna help out I can probably assign you some tasks then. Or just feel free to work on it in the meantime.

markfoster314 commented 3 years ago

*and it would just go dead

markfoster314 commented 3 years ago

I'm curious, do you have a project in mind that would use this?

Me and a few classmates were planning on using it in conjunction with eBay's APIs to make a resale bot, but then got bogged down with schoolwork and internships. If you're interested, I think it would be really cool to revisit that idea.

LonsterMonster commented 3 years ago

just noticed you commented on this i was actually going to use it instead of evey video game only the ones i sell on my store. i am updating my original comment with updates and it does work just saving to the csv it is having trouble

LonsterMonster commented 3 years ago

i am willing to help with the production of this code cause i really like it and this is getting me more into python and webscraping also

LonsterMonster commented 3 years ago

that last update pretty much fixed it but some html code stays in it like the code below

at Deadly Alliance</a>
>Defcon 5</a>
>International Superstar Soccer 64</a>
>Who Framed Roger Rabbit</a>>The Voice: I Want You</a>
>White Xbox 360 Wireless Controller</a>
>SSX Tricky</a>

i have messed with filter() replace() and the lamda way of filtering out stuff like that but could just be that i am fairly new to python

LonsterMonster commented 3 years ago

oh yeah i love trying other peoples codes just cause i remember how it was when other people would use my code that i had written in nodejs way back when

LonsterMonster commented 3 years ago

ok yeah i can help with that idea of the resale bot

LonsterMonster commented 3 years ago

Ok i was working with it and got this will output game names without special characters but some still show as none but it is a real start below is the changed code

soup = BeautifulSoup(browser.page_source, 'html.parser')
        for EachPart in soup.select('tr[id*="product-"]'):
            try:
                title = str(re.search(r'>(.*?)</a>', str(EachPart.select('td[class="title"]'))).group(1))
            except AttributeError:
                title = str(re.search(r'>(.*?)</a>', str(EachPart.select('td[class="title"]'))))
            if title:
                print(title)
            loosePrice = re.findall("\d+\.\d+", str(EachPart.select('td[class="price numeric used_price"]')))
            loosePrice = loosePrice[0] if len(loosePrice) > 0 else "N/A"
            completePrice = re.findall("\d+\.\d+", str(EachPart.select('td[class="price numeric cib_price"]')))
            completePrice = completePrice[0] if len(completePrice) > 0 else "N/A"
            newPrice = re.findall("\d+\.\d+", str(EachPart.select('td[class="price numeric new_price"]')))
            newPrice = newPrice[0] if len(newPrice) > 0 else "N/A"
            newGame = VideoGame(title, console, loosePrice, completePrice, newPrice)
            games.append(newGame)
        return games

and currently i am working on it to give prices of just certain i put in for it so you can get certain games values

LonsterMonster commented 3 years ago

I got a version of ur code here https://github.com/LonsterMonster/Pricecharting-Scraper/blob/master/scraper.py i have it to go for the details of a video game based on when pricecharting shows got it o get the name console and prices but cannot get the other attriburtes you can try my code and maybe help me with what is wrong when u get a chance

markfoster314 commented 3 years ago

Just got back, sounds good! I'll pull it down and take a look

LonsterMonster commented 3 years ago

have u seen a fix for it yet?

markfoster314 commented 3 years ago

Hey! Sorry, I haven't had time yet this week. I'll be able to work on it this weekend though, and should have a fix out by Sunday

LonsterMonster commented 3 years ago

have u been able to work on what is wrong?

markfoster314 commented 3 years ago

Hey, sorry I’m too busy at the moment to work on the repo. If you wanna take a crack at it you’re more than welcome

LonsterMonster commented 3 years ago

ok

markfoster314 / Pricecharting-Scraper

Problem when Running the Webscraper #1