xavdid / reddit-user-to-sqlite

Pull Reddit user data into a SQLite database
https://pypi.org/project/reddit-user-to-sqlite/
MIT License
215 stars 9 forks source link

reddit-user-to-sql occasionally fetches wrong scores #13

Open klmr opened 1 year ago

klmr commented 1 year ago

MWE:

reddit-user-to-sqlite user guepier
sqlite3 reddit.db <<<'select score from comments where id = "jnk8vt6";'

Which shows the output 1. However, in reality the corresponding comment has approximately 8 upvotes.

I didn’t check any other scores to see how widespread the issue is. I realise that Reddit shows random scores to frustrate score gaming, but as far as I am aware the difference between the displayed and the actual score is at most a few points (1? 2?), not 7, as here.

xavdid commented 1 year ago

I'll have to take a closer look on this one. My only guess is that you created your archive when the comment was newer, and thus had a lower score than it shows now. I pipe the value from the API right into the score field, so there shouldn't be any discrepancy there.

FWIW, If you re-run the user command, it'll update scores for existing comments. So at the least, it should be correct (within Reddit's margin of fuzziness)

klmr commented 1 year ago

My only guess is that you created your archive when the comment was newer, and thus had a lower score than it shows now.

I am fairly sure we can exclude this possibility: I generated the archive shortly before creating this ticket, and I happen to know that the comment already had the same score the day before (i.e. on the day it was posted) — and its score hasn’t changed since the day it was posted, as far as I am aware (certainly not that drastically).

I was also able to reproduce this: I ran the import command repeatedly (both on the same DB and after deleting it) when posting this issue yesterday, and it always showed the same score (1), even though the “live” score of the comment when reading it via the Reddit website was 8-ish (unfortunately I neglected to check the comment via a third-party Reddit client).

That said, it’s definitely possible that some caching effect is at play, and today I can no longer reproduce it: the score is now shown as 7 in the export database.

xavdid commented 1 year ago

I'll look into this and try to repro, but if it's something on Reddit's end, i'm not sure there's anything I can do about it.