Closed TheMetalCenter closed 1 year ago
Hm, could you tell me the exact command you used to run it?
Sure, "py leech.py practical1.json"
where my json was:
{ "url": "https://practicalguidetoevil.wordpress.com/2015/03/25/prologue/", "title": "A Practical Guide To Evil: Book 1", "author": "erraticerrata", "content_selector": "#main .entry-wrapper", "content_title_selector": "h1.entry-title", "content_text_selector": ".entry-content", "filter_selector": ".sharedaddy, .wpcnt, style", "next_selector": "a[rel=\"next\"]:not([href*=\"prologue\"])", "cover_url": "https://gitlab.com/Mikescher2/A-Practical-Guide-To-Evil-Lyx/raw/master/APGTE_1/APGTE_front.png" }
Hm, okay. I tried this myself and it did work, so I'd speculate that the most likely issue is some sort of connection problem.
./leech.py examples/dungeonkeeperami.json
[sites] Handler: <class 'sites.arbitrary.Arbitrary'> (examples/dungeonkeeperami.json)
[__main__] Unable to locate leech.json. Continuing assuming it does not exist.
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-2
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-3
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-4
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-5
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-6
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-7
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-8
[sites.arbitrary] Extracting chapter @ https://forums.sufficientvelocity.com/threads/dungeon-keeper-ami-sailor-moon-dungeon-keeper-story-only-thread.30066/page-9
Traceback (most recent call last):
File "/Users/acec/git/leech/./leech.py", line 181, in <module>
cli()
File "/Users/acec/Library/Caches/pypoetry/virtualenvs/leech-aE8Oo1ef-py3.10/lib/python3.10/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/Users/acec/Library/Caches/pypoetry/virtualenvs/leech-aE8Oo1ef-py3.10/lib/python3.10/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Users/acec/Library/Caches/pypoetry/virtualenvs/leech-aE8Oo1ef-py3.10/lib/python3.10/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/acec/Library/Caches/pypoetry/virtualenvs/leech-aE8Oo1ef-py3.10/lib/python3.10/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/acec/Library/Caches/pypoetry/virtualenvs/leech-aE8Oo1ef-py3.10/lib/python3.10/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Users/acec/git/leech/./leech.py", line 168, in download
story = open_story(site, url, session, login, options)
File "/Users/acec/git/leech/./leech.py", line 114, in open_story
raise Exception("Couldn't extract story")
Exception: Couldn't extract story
I believe this is because the website has changed so the example content-selector is no longer valid.
This causes beautiful soup to run, and then the content selector selects nothing, resulting in a story with no content which throws "Couldn't extract story" when the check is actually if not story:
. Maybe adding a more informative error message like "no text was found, please double check your configuration" or something.
I believe this is because the website has changed so the example content-selector is no longer valid. This causes beautiful soup to run, and then the content selector selects nothing, resulting in a story with no content which throws "Couldn't extract story" when the check is actually
if not story:
. Maybe adding a more informative error message like "no text was found, please double check your configuration" or something.
Ah, thank you. It looks like you are correct. The "entry-wrapper" was removed from the website, so removing that from content selector fixed it.
{ "url": "https://practicalguidetoevil.wordpress.com/2015/03/25/prologue/", "title": "A Practical Guide To Evil: Book 1", "author": "erraticerrata", "content_selector": "#main", "content_title_selector": "h1.entry-title", "content_text_selector": ".entry-content", "filter_selector": ".sharedaddy, .wpcnt, style", "next_selector": "a[rel=\"next\"]:not([href*=\"prologue\"])", "cover_url": "https://gitlab.com/Mikescher2/A-Practical-Guide-To-Evil-Lyx/raw/master/APGTE_1/APGTE_front.png" }
And it looks like that is already fixed in the current examples, which is why it worked for kemayo. I was using an out of date json.
I'm getting an error on Practical Guide to Evil that I didn't use to get, where the final extraction fails. It still works on other another website I tried (The Wandering Inn).
I was on a new install so I thought the issue was related to that, but I went back to an install on a different device that never had issues before and it occurred there as well.
File "C:\Users\user\leech-master\leech.py", line 159, in
cli()
File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1130, in call
return self.main(args, kwargs)
File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, ctx.params)
File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 760, in invoke
return __callback(args, **kwargs)
File "C:\Users\user\leech-master\leech.py", line 152, in download
story = open_story(site, url, session, login, options)
File "C:\Users\user\leech-master\leech.py", line 105, in open_story
raise Exception("Couldn't extract story")
Exception: Couldn't extract story