headrun / SWIFT

2 stars 0 forks source link

Target Crawler #89

Open SreenivasDega opened 4 years ago

SreenivasDega commented 4 years ago

To build a crawler for target.com, initiate with men's apparel.

farooqbabajan1540 commented 4 years ago

Have checked the site and crawled some data. image

farooqbabajan1540 commented 4 years ago

need to explore on crawl the next page data in api.

farooqbabajan1540 commented 4 years ago

After facing difficulties in getting the tcin key values, checked with mohana and worked on getting the tcins values from api. got the tcins values. need to work on passing these values as dynamically.

farooqbabajan1540 commented 4 years ago

Completed the next page crawl issue, and able to get the data from all available pages, checked the data with csv and doing the modifications inscript to support the table schema and to store data in db.

farooqbabajan1540 commented 4 years ago

created the table schma as product info need to modify the code according to the schema and have to do testing.

farooqbabajan1540 commented 4 years ago

image Sample data with available columns.

farooqbabajan1540 commented 4 years ago

After creating the tables, tried to populate the data into table and able to create output folder with data. But data was not inserting into the table as the generated values are not sufficient to insert into table. Trying to figure out the ways to get the sku id's from the product info.

farooqbabajan1540 commented 4 years ago

Got html json which will have the sku data. Trying to extract th data from this json. image

farooqbabajan1540 commented 4 years ago

size, category, sub_category, ratings_count, reviews_count, mrp, selling_price, discount_percentage(not available), Not able to get the details from source page and in product api as price details and size details coming empty in source page and in api's data is not proper.

farooqbabajan1540 commented 4 years ago

image Sample data.

farooqbabajan1540 commented 4 years ago

Screenshot from 2020-06-08 18-42-36 Attaching the pylint score of committed file.

farooqbabajan1540 commented 4 years ago

After having discussion with charan, started to get the categories available in target.com to crawl data as category wise.

farooqbabajan1540 commented 4 years ago

I have completed the pagination part for category to get the product url, and got the output for below columns. image

Working on getting the data of size, available status and reviews ratings fields as data is not available in the source code. Checking in api's to find the relevant data.

farooqbabajan1540 commented 4 years ago

got an api to get the size and available status, facing below issue to get the reviews and rating count. image

farooqbabajan1540 commented 4 years ago

after completed the crawler got this data, need to load into tables. image And have to add remaining categories to crawl.

farooqbabajan1540 commented 4 years ago

After adding the sub categories verified the data in table. got these outputs. image image

farooqbabajan1540 commented 4 years ago

Trying to find the issue with hd_id column.

farooqbabajan1540 commented 4 years ago

Have fixed the issue and attaching the sample data here. Checking the code with pylint will commit once the process done. image

farooqbabajan1540 commented 4 years ago

image Attaching the pylint score here.

farooqbabajan1540 commented 4 years ago

Neglected the records having price 0.