xavdid / reddit-user-to-sqlite

Pull Reddit user data into a SQLite database
https://pypi.org/project/reddit-user-to-sqlite/
MIT License
215 stars 9 forks source link

Submit post URLs to Archive.org and store archived URL when crawling #15

Open wackget opened 1 year ago

wackget commented 1 year ago

It would be great if the script could submit post URLs to archive.org and then store the archived URL as part of the exported data.

To be clear, I don't mean archiving the comment-level URL but rather the top-level post URL itself, with everybody's comments visible.

It would be one possible way of saving a full post's worth of context without excessive reddit API queries.

brandongalbraith commented 1 year ago

Recommend pulling in https://github.com/akamhy/waybackpy for polling CDX of existing captures as well as for initiating archive ops for this use case.

xavdid commented 1 year ago

I'll look into this as an option. No promises either way, especially with so many subs shut down right now.