Closed Bolzano-Weierstrass closed 6 years ago
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
amazonscraper/client.py | 1 | 91.8% | ||
<!-- | Total: | 1 | --> |
Totals | |
---|---|
Change from base Build 43: | 2.3% |
Covered Lines: | 148 |
Relevant Lines: | 158 |
Very good job Thomas @Bolzano-Weierstrass. I like your integration đź‘Ť
Unfortunately, I get different kind of prices :(
When I test it with amazon2csv.py -k "python" -m 10
, and check with the web page https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Dstripbooks&field-keywords=python
For example : Learning Python, 5th Edition : $21.21 <= but this is the price for renting A Smarter Way to Learn Python: Learn it faster. Remember it longer. : $7.84 <= but this is the price for Kindle edition
Also, when I test it with a amazon.fr search url, I only get N/A for prices.
Example :
amazon2csv.py -m 20 -u "https://www.amazon.fr/s/ref=nb_sb_noss_2?__mk_fr_FR=%C3%85M%C3%85%C5%BD%C3%95%C3%91&url=search-alias%3Daps&field-keywords=python&rh=i%3Aaps%2Ck%3Apython"
"Apprendre Ă programmer avec Python 3",4.2,36,N/A,https://www.amazon.fr/Apprendre-%C3%A0-programmer-avec-Python/dp/2212134347
I knew this was going to be difficult with all these kind of prices...
hi @tducret ,
Thanks for the feedback,
I noticed it... but since I did not know what was the goal of people using this soft (to buy, to rent, kindle,..) I thought one price would give an indication. Giving it a second thought, giving a price range (min - max) of all avaiable prices might be more interesting. What do you think ?
Regarding the "using the search url directly problem". I've just noticed that Amazon uses EUR instead of € while it uses $ and not USD. I thought it used only single-char currency so it can be fixed.
Thanks
"Min price", "Max price" seems a pretty good idea yes. A more ambitious idea would be to extract every prices with the good category "Kindle Edition", "Paperback" (and even, new, used...). Or perhaps scrape only one kind of price (Paperback for books for instance). What do you think?
The issue is that Amazon translates everything even the html/css tags therefore is it very difficult to know what we scrap: one time it is 'paperback' and the next time it is 'broché'...
What if we got all prices in a dict with the category indicated (without translation at first)? Like :
{
"paperback":"21.16$",
"kindle edition":"9.99$"
}
or
{
"broché":"20.50€",
"format kindle":"9.99€"
}
It would allow to get the min/max, and even translate the different categories in the future.
You could then ask amazon2csv --filter="paperback"
to get only the paperback prices.
I am not convinced by what I've done. I manage to run it locally and I can retrieve only one price and its label given the product html. It is not sufficient to be interesting...
Why do you say so @Bolzano-Weierstrass ?
When I run amazon2csv.py -k "python" -m 10
, I got :
Product title,Rating,Number of customer reviews,Product URL,Paperback-to rent,Paperback-to buy,Kindle Edition-to rent,Kindle Edition-to buy
"Python Crash Course: A Hands-On, Project-Based Introduction to Programming",4.5,318,"https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036",N/A,$27.16,N/A,$24.28
"Learning Python, 5th Edition",4,300,"https://www.amazon.com/Learning-Python-5th-Mark-Lutz/dp/1449355730",$21.87,$31.24,$15.58,$34.10
"A Smarter Way to Learn Python: Learn it faster. Remember it longer.",4.8,218,"https://www.amazon.com/Smarter-Way-Learn-Python-Remember-ebook/dp/B077Z55G3B",N/A,$17.96,N/A,$7.75
Oh, nice :) When I run the same command as you do, I get :
Product title,Rating,Number of customer reviews,Product URL,Paperback-to rent,Paperback-to buy,Kindle Edition-to rent,Kindle Edition-to buy "Python Crash Course: A Hands-On, Project-Based Introduction to Programming",4.5,318,https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036,N/A,$27.16,N/A,N/A "Learning Python, 5th Edition",4,300,https://www.amazon.com/Learning-Python-5th-Mark-Lutz/dp/1449355730,$21.87,$31.24,N/A,N/A "A Smarter Way to Learn Python: Learn it faster. Remember it longer.",4.8,218,https://www.amazon.com/Smarter-Way-Learn-Python-Remember-ebook/dp/B077Z55G3B,N/A,N/A,N/A,$7.75
And I tested multiple commands I never got more than 2 not N/A prices(while it is not yout case) so I was not convinced. Moreover it complexifies the code quite a lot. Your call :)
Weird... That may be Amazon anti-scraping protections :S I have to review your code in details but it's true that it seems complicated.
answering #5 ticket.
This commit adds the price to retrieved features alongside the book title, average rating, number of ratings and URL.
Shouldn't create any additional bugs, if it does don't hesitate to contact me.