ma6254 / FictionDown

小说下载|小说爬取|起点|笔趣阁|导出Markdown|导出txt|转换epub|广告过滤|自动校对
GNU General Public License v3.0
709 stars 140 forks source link

example failed #2

Closed daixiang0 closed 5 years ago

daixiang0 commented 5 years ago

donwload release and install phantomjs, then run example:

$ ./FictionDown --url https://book.qidian.com/info/3249362 d 
2019/03/11 16:11:02 Init PhantomJS
2019/03/11 16:11:03 URL: "https://book.qidian.com/info/3249362"
2019/03/11 16:11:03 Close PhantomJS
2019/03/11 16:11:03 failed
$ phantomjs --version
1.9.8
ma6254 commented 5 years ago

maybe try it

1. disable phantomjs

--driver option not equals "phantomjs"(default value), to disable phantomjs, to use golang lib: http to request qidian url

 $ ./FictionDown --url xxx d --driver 1 
...
...
...

2. build latest commit

$ go build -v  github.com/ma6254/FictionDown/cmd/FictionDown
$ ./FictionDown --url https://book.qidian.com/info/3249362 d
2019/03/11 17:32:37 URL: "https://book.qidian.com/info/3249362"
2019/03/11 17:32:37 Init PhantomJS
2019/03/11 17:33:00 Loading....
书名: "一世之尊"
作者: "爱潜水的乌贼"
封面: https://bookcover.yuewen.com/qdbimg/349573/3249362/180
简介:
    我这一生,不问前尘,不求来世,只轰轰烈烈,快意恩仇,败尽各族英杰,傲笑六道神魔!
    万年之后,大劫再启,如来金身,元始道体,孰强孰弱,如来神掌,截天七式,谁领风骚?
    轮回之中,孟奇自少林寺开始了自己“纵横一生,谁能相抗”的历程。
章节数:
    作品相关卷(免费) 17章
    第一卷 少年侠气卷(免费) 84章
    第二卷 平沙茫茫黄入天卷(免费) 12章
    第二卷 平沙茫茫黄入天卷(VIP) 54章
    第三卷 满堂花醉三千客卷(VIP) 355章
    第四卷 二十年纵横间卷(VIP) 403章
    第五卷 人有病,天知否?卷(VIP) 26章
    第六卷 东风夜放花千树卷(VIP) 240章
    第七卷 天意自古高难问卷(VIP) 188章
    第八卷 苍茫大地谁主沉浮卷(VIP) 65章
2019/03/11 17:33:01 Working...
2019/03/11 17:33:01 routine: 10
...
...
...

3. upgrade phantomjs to latest version

$ phantomjs --version # this is my used phantomjs in my MacOS laptop 
2.1.1

中文

英文不好请见谅

1. 禁用phantomjs

d子命令下有个--driver选项,默认值是"phantomjs",也就是使用phantomjs爬取,改成其他任意值就可以禁用phantomjs,禁用后,将使用golang的官方库也就是http库去构建http请求

这个选项的意义在于:

起点每本书信息页面防爬取策略不一样

  1. 有时卷信息是动态加载(这是需要phantomjs)的有时是静态直接给的
  2. 有时会是移动端页面有时是PC端页面

大多数情况是两者(phantomjs和直接http)都可以爬取 小部分情况只能使用其中一个(phantomjs可以但是http不行,有时反之)

以上几端将会补充进README.md,是我的疏漏 以下两种情况可能性较小,暂不讨论

2. 编译最新的commit

3. 升级phantomjs

daixiang0 commented 5 years ago

upgrade phantomjs then use release does not work:

$ ./FictionDown --url https://book.qidian.com/info/3249362 d 
2019/03/11 19:22:11 Init PhantomJS
2019/03/11 19:22:12 URL: "https://book.qidian.com/info/3249362"
2019/03/11 19:22:14 Close PhantomJS
2019/03/11 19:22:14 not match volumes
$ phantomjs -v
2.1.1

disable driver works.

daixiang0 commented 5 years ago

build with origin code still not work:

$ go build -v  github.com/ma6254/FictionDown/cmd/FictionDown
github.com/ma6254/FictionDown/cmd/FictionDown
$  ./FictionDown --url https://book.qidian.com/info/3249362 d
2019/03/11 19:25:24 URL: "https://book.qidian.com/info/3249362"
2019/03/11 19:25:24 Init PhantomJS
2019/03/11 19:25:27 Close PhantomJS
2019/03/11 19:25:27 not match volumes
daixiang0 commented 5 years ago

With below content, can not download:

bookurl: https://book.qidian.com/info/1004608738
bookname: 圣墟
author: 辰东
coverurl: https://bookcover.yuewen.com/qdbimg/349573/1004608738/180
description: |-
  在破败中崛起,在寂灭中复苏。
  沧海成尘,雷电枯竭,那一缕幽雾又一次临近大地,世间的枷锁被打开了,一个全新的世界就此揭开神秘的一角……
tmap:
- https://www.biqiuge.com/book/4772       <===
- https://www.biquge5200.cc/52_52542    <===
volumes: []

Just add marked lines based on generated file.

ma6254 commented 5 years ago

Can't match the volume information, of course, can't download, only try several times, there is always one time to get the volume information

i will fix it

中文

多试几次,总有一次可以获取到卷信息,这时才会爬取正版内容,然后才可以添加盗版信息,爬取盗版内容

我会在后几个commit中添加重试机制

Bash

I tried it 9 times and finally got it. 我尝试了9次,终于获取到了

maqinfen@mqf ~/m/s/g/m/F/release> ./FictionDown --url https://book.qidian.com/info/3249362 d
2019/03/12 04:55:43 URL: "https://book.qidian.com/info/3249362"
2019/03/12 04:55:43 Init PhantomJS
2019/03/12 04:55:50 Close PhantomJS
2019/03/12 04:55:50 not match volumes
maqinfen@mqf ~/m/s/g/m/F/release> ./FictionDown --url https://book.qidian.com/info/3249362 d
2019/03/12 04:55:52 URL: "https://book.qidian.com/info/3249362"
2019/03/12 04:55:52 Init PhantomJS
2019/03/12 04:55:59 Close PhantomJS
2019/03/12 04:55:59 not match volumes
maqinfen@mqf ~/m/s/g/m/F/release> ./FictionDown --url https://book.qidian.com/info/3249362 d
2019/03/12 04:56:00 URL: "https://book.qidian.com/info/3249362"
2019/03/12 04:56:00 Init PhantomJS
2019/03/12 04:56:06 Close PhantomJS
2019/03/12 04:56:06 not match volumes
maqinfen@mqf ~/m/s/g/m/F/release> ./FictionDown --url https://book.qidian.com/info/3249362 d
2019/03/12 04:56:08 URL: "https://book.qidian.com/info/3249362"
2019/03/12 04:56:08 Init PhantomJS
2019/03/12 04:56:15 Close PhantomJS
2019/03/12 04:56:15 not match volumes
maqinfen@mqf ~/m/s/g/m/F/release> ./FictionDown --url https://book.qidian.com/info/3249362 d
2019/03/12 04:56:17 URL: "https://book.qidian.com/info/3249362"
2019/03/12 04:56:17 Init PhantomJS
2019/03/12 04:56:21 Close PhantomJS
2019/03/12 04:56:21 not match volumes
maqinfen@mqf ~/m/s/g/m/F/release> ./FictionDown --url https://book.qidian.com/info/3249362 d
2019/03/12 04:56:23 URL: "https://book.qidian.com/info/3249362"
2019/03/12 04:56:23 Init PhantomJS
2019/03/12 04:56:29 Close PhantomJS
2019/03/12 04:56:29 not match volumes
maqinfen@mqf ~/m/s/g/m/F/release> ./FictionDown --url https://book.qidian.com/info/3249362 d
2019/03/12 04:56:31 URL: "https://book.qidian.com/info/3249362"
2019/03/12 04:56:31 Init PhantomJS
2019/03/12 04:56:36 Close PhantomJS
2019/03/12 04:56:36 not match volumes
maqinfen@mqf ~/m/s/g/m/F/release> ./FictionDown --url https://book.qidian.com/info/3249362 d
2019/03/12 04:56:38 URL: "https://book.qidian.com/info/3249362"
2019/03/12 04:56:38 Init PhantomJS
2019/03/12 04:56:44 Close PhantomJS
2019/03/12 04:56:44 not match volumes
maqinfen@mqf ~/m/s/g/m/F/release> ./FictionDown --url https://book.qidian.com/info/3249362 d
2019/03/12 04:56:46 URL: "https://book.qidian.com/info/3249362"
2019/03/12 04:56:46 Init PhantomJS
书名: "一世之尊"
作者: "爱潜水的乌贼"
封面: https://bookcover.yuewen.com/qdbimg/349573/3249362/180
简介:
    我这一生,不问前尘,不求来世,只轰轰烈烈,快意恩仇,败尽各族英杰,傲笑六道神魔!
    万年之后,大劫再启,如来金身,元始道体,孰强孰弱,如来神掌,截天七式,谁领风骚?
    轮回之中,孟奇自少林寺开始了自己“纵横一生,谁能相抗”的历程。
章节数:
    作品相关卷(免费) 17章
    第一卷 少年侠气卷(免费) 84章
    第二卷 平沙茫茫黄入天卷(免费) 12章
    第二卷 平沙茫茫黄入天卷(VIP) 54章
    第三卷 满堂花醉三千客卷(VIP) 355章
    第四卷 二十年纵横间卷(VIP) 403章
    第五卷 人有病,天知否?卷(VIP) 26章
    第六卷 东风夜放花千树卷(VIP) 240章
    第七卷 天意自古高难问卷(VIP) 188章
    第八卷 苍茫大地谁主沉浮卷(VIP) 65章
2019/03/12 04:56:56 Working...
2019/03/12 04:56:56 routine: 10
 231 / 1444 [====================>-----------------------------------------------------------------------------------------------------------]  16.00% 03m44s^C2019/03/12 04:57:39 进程信号: interrupt
2019/03/12 04:57:39 [爬取结束] 已缓存:113 样本:119 完成样本:0
2019/03/12 04:57:39 Close PhantomJS
maqinfen@mqf ~/m/s/g/m/F/release>
ma6254 commented 5 years ago

add support Chromedp this will open a new chrome window if you installed chrome it will close after loading is complete

./FictionDown --url https://book.qidian.com/info/3249362 d --driver chromedp