jht3QAQ / PixivSpider

一个由Java制作的P站爬虫
GNU General Public License v3.0
9 stars 1 forks source link

bug #1

Open SharkPika opened 2 months ago

SharkPika commented 2 months ago

非常好的爬虫 但是4年后再用还是会有bug unexpected end of stream on null重新尝试获取 https://pixiv.net/ajax/search/artworks/genshin?word=genshin&mode=r18&p=1&type=all&lang=zh&s_mode=s_type unexpected end of stream on null重新尝试获取 https://pixiv.net/ajax/search/artworks/genshin?word=genshin&mode=r18&p=1&type=all&lang=zh&s_mode=s_type unexpected end of stream on null重新尝试获取 https://pixiv.net/ajax/search/artworks/genshin?word=genshin&mode=r18&p=1&type=all&lang=zh&s_mode=s_type unexpected end of stream on null重新尝试获取 https://pixiv.net/ajax/search/artworks/genshin?word=genshin&mode=r18&p=1&type=all&lang=zh&s_mode=s_type

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffcb9a1bf00, pid=31196, tid=25208
#
# JRE version: Java(TM) SE Runtime Environment (17.0.2+8) (build 17.0.2+8-LTS-86)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0.2+8-LTS-86, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, windows-amd64)
# Problematic frame:
# V  [jvm.dll+0xbf00]
#
# No core dump will be written. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# D:\QQ\1127367472\FileRecv\Source\PixivSpider-master\build\libs\hs_err_pid31196.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
jht3QAQ commented 1 month ago

这仓库都四年没维护了2333 如果让我维护我可能会选择用python重写(逃)

SharkPika commented 1 month ago

其实我想学学java怎么做爬虫才搜到你这个仓库的

jht3QAQ commented 1 month ago

润python吧 代码量起码能少一半 如果单论爬虫的话 无非就是生成请求然后去下载罢了 生成请求获取图片列表->遍历图片列表去下载 翻翻我这里源代码能找到我之前找的一些P站的API 像下面这些之类的

https://www.pixiv.net/ajax/search/artworks/{kw}?word={kw}&mode=safe&p=1&type=all&&lang=zh&s_mode=s_tag
https://www.pixiv.net/ajax/user/{uid}/profile/all?lang=zh
https://www.pixiv.net/ajax/user/{uid}/profile/illusts?ids[]={ids}work_category=illustManga&is_first_page=1&lang=zh
https://www.pixiv.net/ajax/illust/{pid}/pages?lang=zh

如果你想做p站的爬虫应该能用的上 这些api返回的都是json 解析一下就能找到自己想要的数据 具体点下面的链接就能懂了

https://www.pixiv.net/ajax/search/artworks/genshin?word=genshin&mode=r18&p=1&type=all&lang=zh&s_mode=s_type
https://www.pixiv.net/ajax/user/52542337/profile/all?lang=zh
https://www.pixiv.net/ajax/user/52542337/profile/illusts?ids[]=118201101&ids[]=117777980&work_category=illustManga&is_first_page=1&lang=zh
https://www.pixiv.net/ajax/illust/118201101/pages?lang=zh

就这么构造请求 去获取json 获取根据画师/关键字获取图片列表 再根据图片列表获取图片的url 然后下载就好