Open FishermanZzhang opened 6 years ago
Something like this. You can collect URLs on items from search page each subcategory (eg https://www.aliexpress.com/category/200003482/dresses.html). You can use proxy and cookies arguments to prevent a ban.
import bs4
import json
import tqdm
import time
from random import shuffle
def get_item_info(product, sess, proxy, cookies):
...
sess = requests.Session()
sess.headers.update({'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.59 Safari/537.36'})
product = {
'url': 'https://www.aliexpress.com/item/Bamskarosa-Hot-Sale-Womens-Summer-Lace-Dress-2017-Vintage-O-Neck-Slim-Sexy-Pin-up-Rockabilly/32807441215.html?spm=2114.search0103.3.9.1d0ad88aABD7Ci&ws_ab_test=searchweb0_0,searchweb201602_4_10320_10152_10321_10151_10065_10344_10068_10342_10547_10343_10322_10340_10548_10341_10193_10194_10084_10083_10304_10615_10307_10302_10180_10313_10059_10314_10184_10534_100031_10319_10604_10103_10186_10142,searchweb201603_25,ppcSwitch_4&algo_expid=a4cac221-2dda-4b75-aeea-3210457f31a5-1&algo_pvid=a4cac221-2dda-4b75-aeea-3210457f31a5&priceBeautifyAB=3',
'id': '32702639988',
'cat': 'Women\'s Clothing & Accessories',
'subcat': 'Dresses'
}
get_item_info(product, sess, None, None)```
how to call the function get_item_info? eg. how get the product list? could you share this?