halcy / Mastodon.py

Python wrapper for the Mastodon ( https://github.com/mastodon/mastodon/ ) API.
MIT License
876 stars 150 forks source link

Are there any plans to allow retrieval of historical statuses for a given topic #344

Closed Anurag-Saksena closed 1 year ago

Anurag-Saksena commented 1 year ago

Just to provide some context, I am working on a research project that uses NLP to determine the information-to-noise ratio of Mastodon content when compared to content with other social media platforms.

In order to do that, I need to be able to retrieve all the posts related to a particular topic that have been posted in the past.

I have been through the documentation and currently, it seems like the best way to do something like this is to use the Mastodon.stream_hashtag() function to stream all statuses with a particular hashtag. But this will only work for statuses that have been posted from that moment onwards, as far as I can tell.

Is there any way to retrieve all historical statuses associated with a particular hashtag and if there isn't, are there any plans to add this feature in the future?

andypiper commented 1 year ago

👋🏻 Mastodon project developer advocate, here.

This is probably more of a question for the core Mastodon team more than for the Python library, which is built independently. Mastodon has no historical search feature and has no immediate plans to add one (fwiw, you can take a look at the project roadmap here). Full-text search for individual user statuses can be enabled for individual instances at the choice of the instance owner, but this is not currently very common.

The method you have landed on is probably the only way at the moment that you can carry out the kind of research you're looking at - I don't have a better answer for you.

Feel free to join the Mastodon project Discussions if you like - as mentioned, I don't think your question is specifically an issue with Mastodon.py or any individual language implementation.

halcy commented 1 year ago

Sure, you can use the hashtag timeline to get all the statuses your instance has seen that have a certain hashtag:

https://mastodonpy.readthedocs.io/en/stable/07_timelines.html#mastodon.Mastodon.timeline_hashtag

That one is paginated, so you can use fetch_next to get more and more statuses: https://mastodonpy.readthedocs.io/en/stable/12_utilities.html#mastodon.Mastodon.fetch_next

I don't know if there's a limit for how far it will backfill - maybe it'll actually go as far back as the instances inception.

as a sidenote and word of warning, though: Many Mastodon users are not particularly happy about potentially being included in studies or datasets without express, positive consent. Depending on what you do, there may or may not be backlash.

halcy commented 1 year ago

Closing this as resolved / answered, though feel free to reopen or post here if you have more questions.