stanfordio / gogettr

Public API client for GETTR, a "non-bias [sic] social network," designed for data archival and analysis.
Apache License 2.0
89 stars 23 forks source link

Issues Using All API for Posts Greater than p7b5gh #21

Closed pjachim closed 5 days ago

pjachim commented 2 years ago

When I try to use the all API with posts greater than p7b5gh, I start running into issues where I think that there are large numbers of indices seem to be missing.

e.g., running the following command:

gogettr all --max 1000 --first p7b5gh

Returns a single post.

I tried with a couple of much larger ids (copied from another issue), I got a similar result. I did the same thing with the module mode, and still no luck. I want to be respectful of their API and don't want to like brute force until I see more posts, but I am not sure how else to collect sets of posts for a given time period or like the next n posts after a specific _id.

Before that index, I don't seem to run in quite as many issues, though there are definitely gaps in the returned indices.

Do you have any recommendations for using all with larger indices, or should I switch to scraping posts for specific users, rather than specific points in time? Am I missing something? Do the indices change to a different base or something? Is this just a weird coincidence that I am reading into too much?

Thank you for taking a look, this tool is super helpful!

milesmcc commented 2 years ago

I think you're right. It seems like GETTR changed the way they handle post IDs a few months/weeks ago, which means that we can't sequentially iterate through posts anymore. Not sure what the best way to get around this is.

lxcode commented 5 days ago

Don't see an obvious fix on our end.