headrun / SWIFT

2 stars 0 forks source link

Walmart Crawler #87

Closed SreenivasDega closed 4 years ago

SreenivasDega commented 4 years ago

To Write a crawler get data from Walmart site

jaffrinkirthiga96 commented 4 years ago

I am able to get the data through the API and generate the data as meta and insight queries. https://www.walmart.com/search/api/preso?cat_id=5438_133197_4237948_3187021&prg=desktop In the API, I am not able to find the product Size details and there is no price like MRP/Selling Price. Price is like min_price and max_price.

jaffrinkirthiga96 commented 4 years ago

I observed like the price and availability changes constant;y based on colour and size of the product. Also a single prroduct has 39 colours with 6 different sizes. Hence as suggested by charan, I will ignore the colours as of now and take a default colour as input and proceed the crawler. The price,size and sku ID details are available in the Html source page.

jaffrinkirthiga96 commented 4 years ago

I am able to get the Sku Id , price,rating details from the Source for the default product only. The product ID based on size, triggers everytime on clicking via javasript and hence not able to get the track of size details. @malla794 , Please suggest how to proceed.

jaffrinkirthiga96 commented 4 years ago

As charan suggested, I Have taken minimum price as the default price for all product sizes and its corresponding availabilty status. Have completed crawling of Mens-Tshirt and Mens-Shirt category. Have implemented crawling of pages through pagination(Data Testing is pending) image

Created Tables and tried loading the crawled data into local db.But I am getting some error like ERROR 1261 (01000) at line 1: Row 1 doesn't contain data for all columns

On debugging it, will load the data into DB.

jaffrinkirthiga96 commented 4 years ago

Have Fixed the size issue and loaded the data into local db and committed the crawler.

jaffrinkirthiga96 commented 4 years ago

For some products the price details are not available in the API and hence mrp is 0. I have made the change in crawler to get price details from HTML Source Page if the price is unavailable in API.