m-haisham / novelsave

This is a tool to download and convert novels from popular sites to e-books.
Apache License 2.0
36 stars 7 forks source link

Missing paragraph #31

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hi! There seems a missing paragraph to the epub I made from https://chrysanthemumgarden.com/ site.

from the site

20210505_165611.jpg

from the epub I made

20210505_165553.jpg

m-haisham commented 3 years ago

Could you post the link to the chapter.

ghost commented 3 years ago

https://chrysanthemumgarden.com/novel-tl/pubg/pubg-28/

m-haisham commented 3 years ago

The bug was cause by the blacklisted pattern ^[\W\D]*(volume|chapter)[\W\D]+\d+[\W\D]*$ matching the paragraph.

>>> import re
>>> re.match(r'^[\W\D]*(volume|chapter)[\W\D]+\d+[\W\D]*$', 'He clicked to sort the listings by the highest sale volume, and all of them were cheap goods under 30 yuan. The store had countless poor reviews—after all, you get what you paid for—and these were all just bought for the purpose of video chatting with family members, and etc. No matter how poor the quality was, as long as the person could be seen, it was fine.')

<re.Match object; span=(0, 362), match='He clicked to sort the listings by the highest sa>

Telling it was unexpected would be an understatement. good job catching it.