lightnovel-center / linovelib2epub

Crawl light novel from some websites and convert it to epub.
https://pypi.org/project/linovelib2epub/
GNU Affero General Public License v3.0
71 stars 8 forks source link

epubcheck相关 #5

Closed comsoi closed 1 year ago

comsoi commented 1 year ago

使用w3c/epubcheck的工具检查 ERROR(OPF-014): *.epub/EPUB/0.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.

wdpm commented 1 year ago

方便给出你的测试epub文件吗?请尽可能提供相关的环境/配置信息,方便排查。 @xxxfhy

comsoi commented 1 year ago

打包epub.zip Python 3.8.16 (default, Jan 17 2023, 22:25:28) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32

Package Version


beautifulsoup4 4.11.1 bs4 0.0.1 certifi 2022.12.7 charset-normalizer 3.0.1 demjson3 3.0.6 EbookLib 0.18 fake-useragent 1.1.1 idna 3.4 importlib-resources 5.10.2 linovelib2epub 0.0.10 lxml 4.9.2 markdown-it-py 2.1.0 mdurl 0.1.2 Pillow 9.4.0 pip 22.3.1 Pygments 2.14.0 requests 2.28.2 rich 13.2.0 setuptools 57.5.0 six 1.16.0 soupsieve 2.3.2.post1 typing_extensions 4.4.0 urllib3 1.26.14 uuid 1.30 wheel 0.37.1 wincertstore 0.2 zipp 3.11.0

wdpm commented 1 year ago

问题分析与复现

该问题已捕获。

uTools_1674636145430

我删除了上面0.xhtml中的脚本块。使用epubcheck进行测试

D:\Downloads\epubcheck-5.0.0>java -jar epubcheck.jar 1.epub
Validating using EPUB version 3.3 rules.
ERROR(OPF-014): 1.epub/EPUB/1.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/2.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/3.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/4.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/5.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/6.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/7.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/8.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/9.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/10.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/11.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/12.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/13.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/14.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/15.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
ERROR(OPF-014): 1.epub/EPUB/16.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
......

发现输出日志中不再吐槽1.epub/EPUB/0.xhtml有scripted的问题。因此,可以确定是脚本块的问题。

更具体地,是这个js代码块。

  <div class="cgo">
    <script>zation();</script>
  </div>

Epub 3.3 规范中不允许随意放置一个调用的代码块,而且这个函数调用没有定义。

问题溯源

爬虫过程中没有对不需要的js代码进行剥离(strip),因此造成epub规范检测不能通过。所以,部分epub阅读器会直接报错。

修复策略

更新代码,在爬虫过程中将无关js代码去掉。请等待下一次bug修复。

参阅

comsoi commented 1 year ago

好的

wdpm commented 1 year ago

好的

你可以等待bug修复后再关闭该issue,不要在bug没修复之前关闭issue。 issue一旦发起,要么:

comsoi commented 1 year ago

我的问题(鞠躬

wdpm commented 1 year ago

@all-contributors please add @xxxfhy for bug.

allcontributors[bot] commented 1 year ago

@wdpm

I've put up a pull request to add @xxxfhy! :tada: