tducret / amazon-scraper-python

Non-official client to get some info about products sold on Amazon
MIT License
871 stars 159 forks source link

[WIP] Adding the price - issue#5 #7

Closed Bolzano-Weierstrass closed 6 years ago

Bolzano-Weierstrass commented 6 years ago

answering #5 ticket.

This commit adds the price to retrieved features alongside the book title, average rating, number of ratings and URL.

Shouldn't create any additional bugs, if it does don't hesitate to contact me.

coveralls commented 6 years ago

Pull Request Test Coverage Report for Build 59


Files with Coverage Reduction New Missed Lines %
amazonscraper/client.py 1 91.8%
<!-- Total: 1 -->
Totals Coverage Status
Change from base Build 43: 2.3%
Covered Lines: 148
Relevant Lines: 158

đź’› - Coveralls
tducret commented 6 years ago

Very good job Thomas @Bolzano-Weierstrass. I like your integration đź‘Ť

Unfortunately, I get different kind of prices :( When I test it with amazon2csv.py -k "python" -m 10, and check with the web page https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Dstripbooks&field-keywords=python

For example : Learning Python, 5th Edition : $21.21 <= but this is the price for renting A Smarter Way to Learn Python: Learn it faster. Remember it longer. : $7.84 <= but this is the price for Kindle edition

Also, when I test it with a amazon.fr search url, I only get N/A for prices. Example : amazon2csv.py -m 20 -u "https://www.amazon.fr/s/ref=nb_sb_noss_2?__mk_fr_FR=%C3%85M%C3%85%C5%BD%C3%95%C3%91&url=search-alias%3Daps&field-keywords=python&rh=i%3Aaps%2Ck%3Apython"

"Apprendre Ă  programmer avec Python 3",4.2,36,N/A,https://www.amazon.fr/Apprendre-%C3%A0-programmer-avec-Python/dp/2212134347

I knew this was going to be difficult with all these kind of prices...

Bolzano-Weierstrass commented 6 years ago

hi @tducret ,

Thanks for the feedback,

I noticed it... but since I did not know what was the goal of people using this soft (to buy, to rent, kindle,..) I thought one price would give an indication. Giving it a second thought, giving a price range (min - max) of all avaiable prices might be more interesting. What do you think ?

Regarding the "using the search url directly problem". I've just noticed that Amazon uses EUR instead of € while it uses $ and not USD. I thought it used only single-char currency so it can be fixed.

Thanks

tducret commented 6 years ago

"Min price", "Max price" seems a pretty good idea yes. A more ambitious idea would be to extract every prices with the good category "Kindle Edition", "Paperback" (and even, new, used...). Or perhaps scrape only one kind of price (Paperback for books for instance). What do you think?

Bolzano-Weierstrass commented 6 years ago

The issue is that Amazon translates everything even the html/css tags therefore is it very difficult to know what we scrap: one time it is 'paperback' and the next time it is 'broché'...

tducret commented 6 years ago

What if we got all prices in a dict with the category indicated (without translation at first)? Like :

{
  "paperback":"21.16$",
  "kindle edition":"9.99$"
}

or

{
  "broché":"20.50€",
  "format kindle":"9.99€"
}

It would allow to get the min/max, and even translate the different categories in the future. You could then ask amazon2csv --filter="paperback" to get only the paperback prices.

Bolzano-Weierstrass commented 6 years ago

I am not convinced by what I've done. I manage to run it locally and I can retrieve only one price and its label given the product html. It is not sufficient to be interesting...

tducret commented 6 years ago

Why do you say so @Bolzano-Weierstrass ? When I run amazon2csv.py -k "python" -m 10, I got :

Product title,Rating,Number of customer reviews,Product URL,Paperback-to rent,Paperback-to buy,Kindle Edition-to rent,Kindle Edition-to buy
"Python Crash Course: A Hands-On, Project-Based Introduction to Programming",4.5,318,"https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036",N/A,$27.16,N/A,$24.28
"Learning Python, 5th Edition",4,300,"https://www.amazon.com/Learning-Python-5th-Mark-Lutz/dp/1449355730",$21.87,$31.24,$15.58,$34.10
"A Smarter Way to Learn Python: Learn it faster. Remember it longer.",4.8,218,"https://www.amazon.com/Smarter-Way-Learn-Python-Remember-ebook/dp/B077Z55G3B",N/A,$17.96,N/A,$7.75
Bolzano-Weierstrass commented 6 years ago

Oh, nice :) When I run the same command as you do, I get :

Product title,Rating,Number of customer reviews,Product URL,Paperback-to rent,Paperback-to buy,Kindle Edition-to rent,Kindle Edition-to buy "Python Crash Course: A Hands-On, Project-Based Introduction to Programming",4.5,318,https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036,N/A,$27.16,N/A,N/A "Learning Python, 5th Edition",4,300,https://www.amazon.com/Learning-Python-5th-Mark-Lutz/dp/1449355730,$21.87,$31.24,N/A,N/A "A Smarter Way to Learn Python: Learn it faster. Remember it longer.",4.8,218,https://www.amazon.com/Smarter-Way-Learn-Python-Remember-ebook/dp/B077Z55G3B,N/A,N/A,N/A,$7.75

And I tested multiple commands I never got more than 2 not N/A prices(while it is not yout case) so I was not convinced. Moreover it complexifies the code quite a lot. Your call :)

tducret commented 6 years ago

Weird... That may be Amazon anti-scraping protections :S I have to review your code in details but it's true that it seems complicated.