Open SreenivasDega opened 4 years ago
Have checked the site and crawled some data.
need to explore on crawl the next page data in api.
After facing difficulties in getting the tcin key values, checked with mohana and worked on getting the tcins values from api. got the tcins values. need to work on passing these values as dynamically.
Completed the next page crawl issue, and able to get the data from all available pages, checked the data with csv and doing the modifications inscript to support the table schema and to store data in db.
created the table schma as product info need to modify the code according to the schema and have to do testing.
Sample data with available columns.
After creating the tables, tried to populate the data into table and able to create output folder with data. But data was not inserting into the table as the generated values are not sufficient to insert into table. Trying to figure out the ways to get the sku id's from the product info.
Got html json which will have the sku data. Trying to extract th data from this json.
size, category, sub_category, ratings_count, reviews_count, mrp, selling_price, discount_percentage(not available), Not able to get the details from source page and in product api as price details and size details coming empty in source page and in api's data is not proper.
Sample data.
Attaching the pylint score of committed file.
After having discussion with charan, started to get the categories available in target.com to crawl data as category wise.
I have completed the pagination part for category to get the product url, and got the output for below columns.
Working on getting the data of size, available status and reviews ratings fields as data is not available in the source code. Checking in api's to find the relevant data.
got an api to get the size and available status, facing below issue to get the reviews and rating count.
after completed the crawler got this data, need to load into tables. And have to add remaining categories to crawl.
After adding the sub categories verified the data in table. got these outputs.
Trying to find the issue with hd_id column.
Have fixed the issue and attaching the sample data here. Checking the code with pylint will commit once the process done.
Attaching the pylint score here.
Neglected the records having price 0.
To build a crawler for target.com, initiate with men's apparel.