plantvsbirds / xq-getUnitData

unit scraper
0 stars 0 forks source link

Issue with capturing data with hooking ajax request #1

Closed plantvsbirds closed 9 years ago

plantvsbirds commented 9 years ago

Sometimes I just get nothing from window.results, which means that the AJAX requests were all finished before I hook up the callback of AJAX requests.

When I intentionally give hooking process a timeout, I get nothing. By lowering the timeout, I could get results(sometimes)

test

This issue is clearly explained by following log


^Croot@hl0-ubuntu0-mini:~/2code/opb-trading# node ./xq-getUnitData/index
opened xq?  success
"page loadedtrue"
"http://xueqiu.com/share/get_content.json?type=3&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
(....URLS)
34201
(LENGTH OF CAPTURED DATA)
1
"New check: 1"
34201
1

Above is success scenario(which JSON's are requested after "page load true" where we hook)

Failing scenario:

node ./xq-getUnitData/index
"http://xueqiu.com/share/get_content.json?type=4&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
"http://xueqiu.com/share/get_content.json?type=4&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
"http://xueqiu.com/share/get_content.json?type=5&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
"http://xueqiu.com/share/get_content.json?type=5&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
"http://xueqiu.com/stock/quotep.json?stockid=1000000"
"http://xueqiu.com/stock/quotep.json?stockid=1000000"
"http://xueqiu.com/share/get_content.json?type=3&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
"http://xueqiu.com/share/get_content.json?type=3&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
"http://xueqiu.com/statuses/facet/source_count.json?symbol=ZH087953&source=all%2Ccube"
"http://xueqiu.com/statuses/facet/source_count.json?symbol=ZH087953&source=all%2Ccube"
"http://xueqiu.com/cubes/rb/get_running.json?symbol=ZH087953"
"http://xueqiu.com/cubes/rb/get_running.json?symbol=ZH087953"
"http://xueqiu.com/cubes/consultants/show_by_uid.json?uid=6141838440"
"http://xueqiu.com/cubes/consultants/show_by_uid.json?uid=6141838440"
"http://xueqiu.com/cubes/nav_daily/all.json?cube_symbol=ZH087953&since=1414114353000&until=1445650353000"
"http://xueqiu.com/cubes/nav_daily/all.json?cube_symbol=ZH087953&since=1414114353000&until=1445650353000"
"http://xueqiu.com/cubes/nav_daily/all.json?cube_symbol=ZH087953&since=1437874353000&until=1445650353000"
"http://xueqiu.com/cubes/nav_daily/all.json?cube_symbol=ZH087953&since=1437874353000&until=1445650353000"
"http://xueqiu.com/cubes/data/rank_percent.json?cube_id=87854&market=cn&dimension=annual&_=1445650351842"
"http://xueqiu.com/cubes/data/rank_percent.json?cube_id=87854&market=cn&dimension=annual&_=1445650351842"
"http://xueqiu.com/statuses/search.json?symbol=ZH087953&page=1&count=20&comment=0"
"http://xueqiu.com/statuses/search.json?symbol=ZH087953&page=1&count=20&comment=0"
opened xq?  success
"page loadedtrue"
2
0
"New check: 0"
plantvsbirds commented 9 years ago

But there's still request output which means that resources are still coming, xq isn't blocking anything. So l think I may 1) somewhat run the hooking earlier 2) get JSON content by other method 3) switch to selenium and leave this alone

Update : of course, hacking into phantom and hack over onResourceReceived remains a valid option

Still, I hate to parse HTML, although it's most straight forward and stable. Look at that shit

plantvsbirds commented 9 years ago

List of interested data URLs


"http://xueqiu.com/share/get_content.json?type=5&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
"http://xueqiu.com/share/get_content.json?type=5&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
"http://xueqiu.com/stock/quotep.json?stockid=1000000"
"http://xueqiu.com/stock/quotep.json?stockid=1000000"
"http://xueqiu.com/share/get_content.json?type=3&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
"http://xueqiu.com/share/get_content.json?type=3&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
"http://xueqiu.com/statuses/facet/source_count.json?symbol=ZH087953&source=all%2Ccube"
"http://xueqiu.com/statuses/facet/source_count.json?symbol=ZH087953&source=all%2Ccube"
"http://xueqiu.com/statuses/search.json?symbol=ZH087953&page=1&count=20&comment=0"
"http://xueqiu.com/statuses/search.json?symbol=ZH087953&page=1&count=20&comment=0"
"http://xueqiu.com/share/get_content.json?type=4&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
"http://xueqiu.com/share/get_content.json?type=4&cube_name=%24%E5%8F%AF%E6%8C%81%E7%BB%AD%E5%8F%91%E5%B1%95(ZH087953)%24&cube_yield=401.13%25"
"http://xueqiu.com/cubes/data/rank_percent.json?cube_id=87854&market=cn&dimension=annual&_=1445650312706"
"http://xueqiu.com/cubes/data/rank_percent.json?cube_id=87854&market=cn&dimension=annual&_=1445650312706"
"http://xueqiu.com/cubes/nav_daily/all.json?cube_symbol=ZH087953&since=1414114314000&until=1445650314000"
"http://xueqiu.com/cubes/nav_daily/all.json?cube_symbol=ZH087953&since=1414114314000&until=1445650314000"
"http://xueqiu.com/cubes/nav_daily/all.json?cube_symbol=ZH087953&since=1437874314000&until=1445650314000"
"http://xueqiu.com/cubes/nav_daily/all.json?cube_symbol=ZH087953&since=1437874314000&until=1445650314000"
plantvsbirds commented 9 years ago

We should close this now since we are going to refactor using ajax replaying method