vladkens / twscrape

2024! X / Twitter API scrapper with authorization support. Allows you to scrape search results, User's profiles (followers/following), Tweets (favoriters/retweeters) and more.
https://pypi.org/project/twscrape/
MIT License
1.13k stars 132 forks source link

Allow pulling from media tab #131

Closed Pigglebear closed 7 months ago

Pigglebear commented 9 months ago

I pulled the /UserMedia url ending with web browser dev tools and played around with modifying user_tweets_raw() and user_tweets(). Only a few parameters needed changing in these two functions to make it work.

I had to modify _gql_items() to pass a modified version of GQL_FEATURES when media was requested.

The return from twitter is slightly different for media and tweets. get_by_path() does not reach the actual content on the first call in the loop, but setting els equal to els[0]["content"]["items"] allows it to find the content and cursor.

Confusingly, the the second and subsequent call return a different format again. This time, if get_by_path() is set to "moduleItems" instead of "entries", the code can find all the the goods.

Some error handling may need to be added, but I have not run into issues.

When I used the new code, using user_media() was much faster than using user_tweets() and filtering out retweets and no media posts.

Thanks, Pigglebear.

vladkens commented 7 months ago

@Pigglebear thank you!