voussoir / timesearch

The subreddit archiver
BSD 3-Clause "New" or "Revised" License
172 stars 7 forks source link

Offline reading #8

Closed liner601 closed 4 years ago

liner601 commented 4 years ago

After getting all the contents of a subreddit, (timesearch.py get_submissions -r subredditname, timesearch.py get_comments -r subredditname, timesearch.py get_wiki -r subredditname).

And I try to render it:

Directory\timesearch>timesearch.py offline_reading -r Subredditofchoice

Directory\timesearch> [main 2020-04-23T00:48:09.223Z] update#setState idle (node:25576) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information (node:25576) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information (node:25576) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information (node:25576) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information (node:25576) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information (node:25576) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information (node:28272) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information (node:28272) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information [main 2020-04-23T00:48:39.225Z] update#setState checking for updates [main 2020-04-23T00:48:39.504Z] update#setState idle

Nothing happens past this point, I have to end it manually using cntrl+c and no files are output.

voussoir commented 4 years ago

Hmm, Timesearch doesn't use node or Electron so something else on your system seems to have gotten tangled in somehow.

When I run timesearch.py offline_reading -r subredditofchoice, it shows me

Building tree for t3_xxxxxx (4 comments)
Wrote .\subreddits\subredditofchoice\offline_reading\t3_xxxxxx.html

And then I just view the html files in my browser.

Did you get any of that? The way your issue text is formatted, it looks like the offline_reading command did nothing, and then the node/Electron stuff comes out of nowhere with no command on the above line.

liner601 commented 4 years ago

Hmm, Timesearch doesn't use node or Electron so something else on your system seems to have gotten tangled in somehow.

When I run timesearch.py offline_reading -r subredditofchoice, it shows me

Building tree for t3_xxxxxx (4 comments)
Wrote .\subreddits\subredditofchoice\offline_reading\t3_xxxxxx.html

And then I just view the html files in my browser.

Did you get any of that? The way your issue text is formatted, it looks like the offline_reading command did nothing, and then the node/Electron stuff comes out of nowhere with no command on the above line.

Weird, electron isnt even in my path, I tried deleting everything electron related but it just says it couldnt find electron in the Microsoft vs files.

this might give some insight into what the command is asking for but I doubt it:

[Error: ENOENT: no such file or directory, open 'C:\Users\GX501\AppData\Local\Programs\Microsoft VS Code\resources\app\out\vs\code\electron-main\main.js'] { errno: -4058, code: 'ENOENT', syscall: 'open', path: 'C:\Users\GX501\AppData\Local\Programs\Microsoft VS Code\resources\app\out\vs\code\electron-main\main.js', phase: 'loading', moduleId: 'vs/code/electron-main/main', neededBy: [ '===anonymous1===' ] }

Edit: I see what you mean about electron coming out of nowhere, it runs the command, gives a blank output then electron shows up, I tried removing vs from the path but it made no difference.

voussoir commented 4 years ago

Are you running Timesearch from within VS? Do you get different results when you run it through a plain old command prompt?

I don't want to make you delete electron since VS does rely on it. It sounds like you may have broken your VS by doing that, lol.

As with the previous issue, try timesearch.py offline_reading --help to make sure timesearch is at least starting properly. It's very surprising to see a completely blank output.

liner601 commented 4 years ago

Are you running Timesearch from within VS? Do you get different results when you run it through a plain old command prompt?

I don't want to make you delete electron since VS does rely on it. It sounds like you may have broken your VS by doing that, lol.

As with the previous issue, try timesearch.py offline_reading --help to make sure timesearch is at least starting properly. It's very surprising to see a completely blank output.

Oh no dont worry I restored everything I deleted, and I always run it trough a normal cmd prompt.

Well Houston I think we have a problem.

timesearch>timesearch.py offline_reading --help

timesearch> [main 2020-04-23T01:58:58.307Z] update#setState idle (node:12040) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information (node:12040) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information (node:12040) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information (node:12040) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information (node:12040) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information (node:12040) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see https://github.com/electron/electron/issues/18397 for more information

voussoir commented 4 years ago

Weird... See the thing about --help is that Timesearch displays the help text without ever dispatching into the offline_reading.py module. Therefore, I would expect that >timesearch.py get_submissions --help (and all the other tools) would show the exact same behavior. Does it?

In the previous issue, you were doing python timesearch.py, and I notice you have switched to writing timesearch.py. Does that change the results? Maybe VS has hijacked the file association for .py files, so >timesearch.py behaves differently than >python timesearch.py?

liner601 commented 4 years ago

Weird... See the thing about --help is that Timesearch displays the help text without ever dispatching into the offline_reading.py module. Therefore, I would expect that >timesearch.py get_submissions --help (and all the other tools) would show the exact same behavior. Does it?

In the previous issue, you were doing python timesearch.py, and I notice you have switched to writing timesearch.py. Does that change the results? Maybe VS has hijacked the file association for .py files, so >timesearch.py behaves differently than >python timesearch.py?

...I cant believe I didn't see that, thank you again.

One last question, is there an option to include images and videos as an offline copy within the rendered website?

voussoir commented 4 years ago

Wow, so they did hijack your .py extension, that's annoying. That behavior where the command seems to do nothing but more output comes later is something I see with programs that spawn subprocesses / daemons, which makes sense for a GUI program like VS. You should be able to reset the file extension association by re-running the Python installer and choosing "Repair", if you'd like to do so.

One last question, is there an option to include images and videos as an offline copy within the rendered website?

Good question, but at the moment there is not. Downloading external content is outside the scope of Timesearch, which only focuses on the text that makes up submissions and comments. External content is a very big can of worms.

Tools like youtube-dl may assist you in downloading videos, but for photos and albums I don't know of any single tool with the support and reputation of youtube-dl unfortunately.

For link posts, the url is of course stored in the timesearch database. But if you're trying to get image links in selfposts and comments too that's going to require parsing to get them out. Well, the reddit API does return html versions of the content so you can get the <a> tags out of that, but Timesearch only keeps the markdown version.

liner601 commented 4 years ago

Wow, so they did hijack your .py extension, that's annoying. That behavior where the command seems to do nothing but more output comes later is something I see with programs that spawn subprocesses / daemons, which makes sense for a GUI program like VS. You should be able to reset the file extension association by re-running the Python installer and choosing "Repair", if you'd like to do so.

One last question, is there an option to include images and videos as an offline copy within the rendered website?

Good question, but at the moment there is not. Downloading external content is outside the scope of Timesearch, which only focuses on the text that makes up submissions and comments. External content is a very big can of worms.

Tools like youtube-dl may assist you in downloading videos, but for photos and albums I don't know of any single tool with the support and reputation of youtube-dl unfortunately.

For link posts, the url is of course stored in the timesearch database. But if you're trying to get image links in selfposts and comments too that's going to require parsing to get them out. Well, the reddit API does return html versions of the content so you can get the <a> tags out of that, but Timesearch only keeps the markdown version.

Yeah visual studio is a mixed bag.

Ripme seems to work for photos and images.

Thanks for all the help!

voussoir commented 4 years ago

I'm glad we could get it figured out, and I'm glad people like you are discovering and enjoying Timesearch. Don't hesitate with any other questions. :)