xavdid / reddit-user-to-sqlite

Pull Reddit user data into a SQLite database
https://pypi.org/project/reddit-user-to-sqlite/
MIT License
215 stars 9 forks source link

Record timestamp for when posts/comments were saved #22

Open quarkw opened 1 year ago

quarkw commented 1 year ago

The new saved posts/comments works nicely! It might be beneficial to gather the timestamp of when a post/comment was saved, so a user can sort by that, instead of sorting by the creation date of posts/comments

xavdid commented 1 year ago

Oh, do you mean like, when you hit the "save" button on a given comment vs when the comment was made?

I don't remember if that info is included anywhere, but I can check! that would be useful.

quarkw commented 1 year ago

Yup, that's what I mean! Peeking at the GDPR archive, it looks like that data is not included.

It may not be possible to get the saved date, but the order of saving should be possible to retrieve from the API, as that's how reddit displays it through the website

xavdid commented 1 year ago

cool! i'll take a look

xavdid commented 1 year ago

So this one is tricky- the tool currently only loads "saved" data from the archive, which (as far as I can tell) is unordered. I go and fetch info about the data, but that doesn't know anything about the user that saved it (since I just look it up by id).

There's a private (unauthenticated) feed that includes all the data and the ordering, but I'd need to add a new command to cover that and I don't know that I'll have time to write and test that this week (before the API changes go into affect saturday). I can get it afterwards, but I don't want to make any promises before we know more about the nature of the changes.

As a stop gap, here's a little script to get the absolute ordering of the data right now and we can get it into the sqlite database at our leisure:

https://gist.github.com/xavdid/f0999e3ea08cc8cdaafce27618e092fd

Download the script, make sure requests is available (pip install requests), and run python saved_ordering.py > ordering.json

I might be able to get the real feature shipped in time, but this should at least preserve your data in case I don't.

Let me know if you need further instructions with the above, too! I'm presuming a bit of Python knowledge.