jae-jae / QueryList

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
https://querylist.cc
2.65k stars 440 forks source link

Unable to parse product information page for Amazon Japan #89

Closed wper closed 3 years ago

wper commented 4 years ago

Amazon has many sites.I use QueryList to parse 99% of the site's product information, except for the Japanese station.

here is my code:

$url1 = 'https://www.amazon.co.jp/DENON-%E3%82%A2%E3%83%8A%E3%83%AD%E3%82%B0%E3%83%AC%E3%82%B3%E3%83%BC%E3%83%89%E3%83%97%E3%83%AC%E3%83%BC%E3%83%A4%E3%83%BC-USB%E9%8C%B2%E9%9F%B3%E6%A9%9F%E8%83%BD-%E3%83%95%E3%83%AB%E3%82%AA%E3%83%BC%E3%83%88-DP-200USB-K/dp/B001IZ6UDC?pf_rd_p=58c92908-f2b1-4cbf-98d2-8911a92c77a0&pd_rd_wg=5RmJU&pf_rd_r=TY6QHXE3JMJ6P42J2J00&ref_=pd_gw_cr_simh&pd_rd_w=MBoK7&pd_rd_r=2cad8a77-bc64-413a-9ba1-e089c722fff6';
$url2 = 'https://www.amazon.com/Apple-Retina-Display-ME279LL-Refurbished/dp/B00TA9FCUU/ref=lp_18332383011_1_2?srs=18332383011&ie=UTF8&qid=1572160773&sr=8-2';
$header = [
    'User-Agent' => 'xxx',
    'Accept' => 'xxxx',
    'Content-Type' => 'application/x-www-form-urlencoded; charset=UTF-8'
];
$ql1 = QueryList::get($url1, null, ['headers' => $header]);
$ql2 = QueryList::get($url2, null, ['headers' => $header]);
// both have the content
var_dump($ql1->getHtml());
var_dump($ql2->getHtml());
 // empty . 
var_dump($ql1->find('#productTitle')->text());
 // not empty . 
var_dump($ql2->find('#productTitle')->text());

I also tested it at http://www.querylist.cc/querylist-test/, but the result is the same, amazon.com has data, and amazon.co.jp has no data.

Is this a bug? please help me, thx!