Closed sughodke closed 6 years ago
It would be handy to set the Amazon baseurl from commandline (or ENV).
Right now the scraper only looks up amazon.co.jp, it would need to be refactored from these files.
core_extract_comments.py 9:# https://www.amazon.co.jp/product-reviews/B00Z16VF3E/ref=cm_cr_arp_d_paging_btm_1?ie=UTF8&reviewerType=all_reviews&showViewpoints=1&sortBy=helpful&pageNumber=1 12: return 'https://www.amazon.co.jp/product-reviews/{}/ref=' \ 20: url = 'http://www.amazon.co.jp/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=' + \
core_generate_product_ids.py 25: main_category_page = get_soup('https://www.amazon.co.jp/gp/site-directory/ref=nav_shopall_btn')
core_utils.py 64: if 'amazon.co.jp' not in url: 65: url = 'https://www.amazon.co.jp' + url
Workaround:
Running the following command at the project directory will recursively replace amazon.co.jp to amazon.com.
find . -type f -exec sed -i 's/amazon.co.jp/amazon.com/g' {} +
@sughodke again happy to review any pull request :)
https://github.com/philipperemy/amazon-reviews-scraper/commit/1b79bf92cd847cef86c978e2063a311bb6f02bd8 Fixed in
It would be handy to set the Amazon baseurl from commandline (or ENV).
Right now the scraper only looks up amazon.co.jp, it would need to be refactored from these files.
core_extract_comments.py 9:# https://www.amazon.co.jp/product-reviews/B00Z16VF3E/ref=cm_cr_arp_d_paging_btm_1?ie=UTF8&reviewerType=all_reviews&showViewpoints=1&sortBy=helpful&pageNumber=1 12: return 'https://www.amazon.co.jp/product-reviews/{}/ref=' \ 20: url = 'http://www.amazon.co.jp/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=' + \
core_generate_product_ids.py 25: main_category_page = get_soup('https://www.amazon.co.jp/gp/site-directory/ref=nav_shopall_btn')
core_utils.py 64: if 'amazon.co.jp' not in url: 65: url = 'https://www.amazon.co.jp' + url
Workaround:
Running the following command at the project directory will recursively replace amazon.co.jp to amazon.com.