Open LonsterMonster opened 3 years ago
Hey! This is sick, it's been so long since I've had any interaction on this I honestly didn't think anyone would be using this repo wold just go dead.
They most likely updated the website format (it has been over a year since I worked on this script and I'm pretty sure the last iteration was pretty hard-coded to their format at the time). We can definitely update the scraper to be more generic and fault tolerant.
I'm out of town this upcoming week but once I get back, if you're still interested, I can for sure work on updating it. Also, if you wanna help out I can probably assign you some tasks then. Or just feel free to work on it in the meantime.
*and it would just go dead
I'm curious, do you have a project in mind that would use this?
Me and a few classmates were planning on using it in conjunction with eBay's APIs to make a resale bot, but then got bogged down with schoolwork and internships. If you're interested, I think it would be really cool to revisit that idea.
just noticed you commented on this i was actually going to use it instead of evey video game only the ones i sell on my store. i am updating my original comment with updates and it does work just saving to the csv it is having trouble
i am willing to help with the production of this code cause i really like it and this is getting me more into python and webscraping also
that last update pretty much fixed it but some html code stays in it like the code below
at Deadly Alliance</a>
>Defcon 5</a>
>International Superstar Soccer 64</a>
>Who Framed Roger Rabbit</a>>The Voice: I Want You</a>
>White Xbox 360 Wireless Controller</a>
>SSX Tricky</a>
i have messed with filter() replace() and the lamda way of filtering out stuff like that but could just be that i am fairly new to python
oh yeah i love trying other peoples codes just cause i remember how it was when other people would use my code that i had written in nodejs way back when
ok yeah i can help with that idea of the resale bot
Ok i was working with it and got this will output game names without special characters but some still show as none but it is a real start below is the changed code
soup = BeautifulSoup(browser.page_source, 'html.parser')
for EachPart in soup.select('tr[id*="product-"]'):
try:
title = str(re.search(r'>(.*?)</a>', str(EachPart.select('td[class="title"]'))).group(1))
except AttributeError:
title = str(re.search(r'>(.*?)</a>', str(EachPart.select('td[class="title"]'))))
if title:
print(title)
loosePrice = re.findall("\d+\.\d+", str(EachPart.select('td[class="price numeric used_price"]')))
loosePrice = loosePrice[0] if len(loosePrice) > 0 else "N/A"
completePrice = re.findall("\d+\.\d+", str(EachPart.select('td[class="price numeric cib_price"]')))
completePrice = completePrice[0] if len(completePrice) > 0 else "N/A"
newPrice = re.findall("\d+\.\d+", str(EachPart.select('td[class="price numeric new_price"]')))
newPrice = newPrice[0] if len(newPrice) > 0 else "N/A"
newGame = VideoGame(title, console, loosePrice, completePrice, newPrice)
games.append(newGame)
return games
and currently i am working on it to give prices of just certain i put in for it so you can get certain games values
I got a version of ur code here https://github.com/LonsterMonster/Pricecharting-Scraper/blob/master/scraper.py i have it to go for the details of a video game based on when pricecharting shows got it o get the name console and prices but cannot get the other attriburtes you can try my code and maybe help me with what is wrong when u get a chance
Just got back, sounds good! I'll pull it down and take a look
have u seen a fix for it yet?
Hey! Sorry, I haven't had time yet this week. I'll be able to work on it this weekend though, and should have a fix out by Sunday
have u been able to work on what is wrong?
Hey, sorry I’m too busy at the moment to work on the repo. If you wanna take a crack at it you’re more than welcome
ok
First off i love how it works,
I havent used webscraping programs before and kinda new to python and the issue i am having is when i run the scarper.py it opens the windows correctly in firefox but doesnt get the games to save them
Edit 1: Console_name console generated an exception: 'NoneType' object has no attribute 'group'
the console_name is name of the console i just didnt put the entire log cause says same error for every game console
not sure if it is the program or pricecharting.com
Update 1: in scraper.py at the line browser.get('https://www.pricecharting.com/console' + console) i replaced with browser.get('https://www.pricecharting.com/' + console) and says that each console has been scraped completed but still doesnt save the values to the csv
Update 2: at the line browser.get('https://www.pricecharting.com/' + console) i replaced with browser.get('https://www.pricecharting.com/console' + console) and at title = re.search(r'>(.*?)', str(EachPart.select('td[class="title"]'))).group(1) i took out the group(1) as that was the error problem then it saved all games but without the games names
Update 3: in the def scrapeVals(console,browser): section find the
for EachPart in soup.select('tr[id*="product-"]'):
and add after itto replace the
title = re.search(r'>(.*?)</a>', str(EachPart.select('td[class="title"]'))).group(1)
i have it to show the name of the game but will get special characters in it like and cannot filter them if could filter thenm would be working correctly