kemayo / leech

Turn a story on certain websites into an ebook for convenient reading
MIT License
158 stars 24 forks source link

Couldn't extract story #83

Closed TheMetalCenter closed 1 year ago

TheMetalCenter commented 2 years ago

I'm getting an error on Practical Guide to Evil that I didn't use to get, where the final extraction fails. It still works on other another website I tried (The Wandering Inn).

I was on a new install so I thought the issue was related to that, but I went back to an install on a different device that never had issues before and it occurred there as well.

File "C:\Users\user\leech-master\leech.py", line 159, in cli() File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1130, in call return self.main(args, kwargs) File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1055, in main rv = self.invoke(ctx) File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 760, in invoke return __callback(args, **kwargs) File "C:\Users\user\leech-master\leech.py", line 152, in download story = open_story(site, url, session, login, options) File "C:\Users\user\leech-master\leech.py", line 105, in open_story raise Exception("Couldn't extract story") Exception: Couldn't extract story

kemayo commented 2 years ago

Hm, could you tell me the exact command you used to run it?

TheMetalCenter commented 2 years ago

Sure, "py leech.py practical1.json"

where my json was: { "url": "https://practicalguidetoevil.wordpress.com/2015/03/25/prologue/", "title": "A Practical Guide To Evil: Book 1", "author": "erraticerrata", "content_selector": "#main .entry-wrapper", "content_title_selector": "h1.entry-title", "content_text_selector": ".entry-content", "filter_selector": ".sharedaddy, .wpcnt, style", "next_selector": "a[rel=\"next\"]:not([href*=\"prologue\"])", "cover_url": "https://gitlab.com/Mikescher2/A-Practical-Guide-To-Evil-Lyx/raw/master/APGTE_1/APGTE_front.png" }

kemayo commented 2 years ago

Hm, okay. I tried this myself and it did work, so I'd speculate that the most likely issue is some sort of connection problem.

acestronautical commented 1 year ago
./leech.py examples/dungeonkeeperami.json
[sites] Handler: <class 'sites.arbitrary.Arbitrary'> (examples/dungeonkeeperami.json)
[__main__] Unable to locate leech.json. Continuing assuming it does not exist.
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-2
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-3
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-4
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-5
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-6
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-7
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-8
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-9
Traceback (most recent call last):
  File "/Users/acec/git/leech/./leech.py", line 181, in <module>
    cli()
  File "/Users/acec/Library/Caches/pypoetry/virtualenvs/leech-aE8Oo1ef-py3.10/lib/python3.10/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/acec/Library/Caches/pypoetry/virtualenvs/leech-aE8Oo1ef-py3.10/lib/python3.10/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/acec/Library/Caches/pypoetry/virtualenvs/leech-aE8Oo1ef-py3.10/lib/python3.10/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/acec/Library/Caches/pypoetry/virtualenvs/leech-aE8Oo1ef-py3.10/lib/python3.10/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/acec/Library/Caches/pypoetry/virtualenvs/leech-aE8Oo1ef-py3.10/lib/python3.10/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/acec/git/leech/./leech.py", line 168, in download
    story = open_story(site, url, session, login, options)
  File "/Users/acec/git/leech/./leech.py", line 114, in open_story
    raise Exception("Couldn't extract story")
Exception: Couldn't extract story

I believe this is because the website has changed so the example content-selector is no longer valid. This causes beautiful soup to run, and then the content selector selects nothing, resulting in a story with no content which throws "Couldn't extract story" when the check is actually if not story:. Maybe adding a more informative error message like "no text was found, please double check your configuration" or something.

TheMetalCenter commented 1 year ago

I believe this is because the website has changed so the example content-selector is no longer valid. This causes beautiful soup to run, and then the content selector selects nothing, resulting in a story with no content which throws "Couldn't extract story" when the check is actually if not story:. Maybe adding a more informative error message like "no text was found, please double check your configuration" or something.

Ah, thank you. It looks like you are correct. The "entry-wrapper" was removed from the website, so removing that from content selector fixed it.

{ "url": "https://practicalguidetoevil.wordpress.com/2015/03/25/prologue/", "title": "A Practical Guide To Evil: Book 1", "author": "erraticerrata", "content_selector": "#main", "content_title_selector": "h1.entry-title", "content_text_selector": ".entry-content", "filter_selector": ".sharedaddy, .wpcnt, style", "next_selector": "a[rel=\"next\"]:not([href*=\"prologue\"])", "cover_url": "https://gitlab.com/Mikescher2/A-Practical-Guide-To-Evil-Lyx/raw/master/APGTE_1/APGTE_front.png" }

TheMetalCenter commented 1 year ago

And it looks like that is already fixed in the current examples, which is why it worked for kemayo. I was using an out of date json.