Closed fpmirabile closed 1 year ago
I'd be a good idea to have a library ~native~ data class, for example User or Twitt
That will help in not be drawn by the heavy payload the graphql endpoints has
For example I've been working on these two:
from dataclasses import dataclass @dataclass class TwitterTweet: rest_id: str user_id: str full_text: str created_at: str retweet_count: int favorite_count: int reply_count: int lang: str in_reply_to_status_id_str: str hashtags: list[str] user_mentions: list[str] urls: list[str] sentiment: str = '' @classmethod def from_payload(cls, payload): legacy_data = payload.get('legacy', {}) entities_data = legacy_data.get('entities', {}) return cls(rest_id=payload.get('rest_id', ''), user_id=legacy_data.get('user_id_str', ''), full_text=legacy_data.get('full_text', ''), created_at=legacy_data.get('created_at', ''), retweet_count=legacy_data.get('retweet_count', 0), favorite_count=legacy_data.get('favorite_count', 0), reply_count=legacy_data.get('reply_count', 0), lang=legacy_data.get('lang', ''), in_reply_to_status_id_str=legacy_data.get( 'in_reply_to_status_id_str', ''), hashtags=[ tag['text'] for tag in entities_data.get('hashtags', []) ], user_mentions=[ mention['id_str'] for mention in entities_data.get('user_mentions', []) ], urls=[ url['expanded_url'] for url in entities_data.get('urls', []) ])
and
from dataclasses import dataclass @dataclass class TwitterUser: id: str name: str screen_name: str statuses_count: int followers_count: int friends_count: int favourites_count: int listed_count: int default_profile: bool default_profile_image: bool location: str description: str description_has_url: bool description_url: str followers_to_following_ratio: float verified_type: str verified: bool is_blue_verified: bool has_graduated_access: bool can_dm: bool media_count: int has_custom_timelines: bool has_verification_info: bool possibly_sensitive: bool @classmethod def from_payload(cls, payload): legacy_data = payload.get('legacy', {}) urls = [ url['expanded_url'] for url in legacy_data.get('entities', {}).get( 'description', {}).get('urls', []) ] followers_count = legacy_data.get('followers_count', 0) friends_count = legacy_data.get('friends_count', 0) return cls( id=payload.get('rest_id', ''), name=legacy_data.get('name', ''), screen_name=legacy_data.get('screen_name', ''), statuses_count=legacy_data.get('statuses_count', 0), followers_count=followers_count, friends_count=friends_count, favourites_count=legacy_data.get('favourites_count', 0), listed_count=legacy_data.get('listed_count', 0), default_profile=legacy_data.get('default_profile', False), default_profile_image=legacy_data.get('default_profile_image', False), location=legacy_data.get('location', ''), description=legacy_data.get('description', ''), description_has_url=bool(urls), description_url=','.join(urls) if urls else '', followers_to_following_ratio=followers_count / friends_count if friends_count != 0 else 0, verified_type=legacy_data.get('verified_type', ''), verified=legacy_data.get('verified', False), is_blue_verified=payload.get('is_blue_verified', False), has_graduated_access=payload.get('has_graduated_access', False), can_dm=legacy_data.get('can_dm', False), media_count=legacy_data.get('media_count', 0), has_custom_timelines=legacy_data.get('has_custom_timelines', False), has_verification_info=payload.get('verification_info', ''), possibly_sensitive=legacy_data.get('possibly_sensitive', False), )
Hey @fpmirabile It looks great. I will add these data classes soon. Here are a few things I might add to this :
And then we will use the data classes to get the data as you suggested.
Feel free to share your thoughts.
Makes sense. I just copy/paste mine as suggestion since helped me out on cleaning up a lot of stuff. Thanks @iSarabjitDhiman
Makes sense. I just copy/paste mine as suggestion since helped me out on cleaning up a lot of stuff. Thanks @iSarabjitDhiman
Yeah, I understand the pain of finding data out of those nested datasets. Your solution is great and is time saving. I just need to add some way to accept keys as a list then I will integrate it with the data classes. I will implement it soon.
Thanks for the idea. ✌️
I agree, it's helpfull to use objects for the tweets and users
Hey @fpmirabile , @codilau
So I have been working on these dataclasses. I need a suggestion :
I am planning to call it User and Tweet instead of TwitterUser and TwitterTweet.
How would u like it to work?
from tweeterpy import TweeterPy
from tweeterpy.util import User,Tweet
twitter = TweeterPy()
user_data = twitter.get_user_data("elonmusk")
elon_musk = User(user_data)
# or we can do it in a single step
user = User(twitter.get_user_data("elonmusk"))
OR THE SECOND WAY, like @fpmirabile did, with the from_payload method.
from tweeterpy import TweeterPy
from tweeterpy.util import User,Tweet
twitter = TweeterPy()
user_data = twitter.get_user_data("elonmusk")
elon_musk = User.from_payload(user_data)
So should it be the first way User(dataset)
or the second way User.from_payload(dataset)
?
Same goes for Tweet.
(Tweet.from_payload(dataset)
orTweet(dataset)
?
Let me know your thoughts. I am open to suggestions.
Thanks for the idea @fpmirabile . Otherwise I wouldn't even bother to implement dataclasses. I thought users would navigate through the whole dataset themselves with the (find_nested_key) function. Because you know some might need some datapoints and others may not. But you are right, it's useful to provide some sort of pre-built template for the users which returns at least basic datapoints that everyone is interested in.
You're welcome @iSarabjitDhiman is the less I can do since you are saving my on my engineering final project! Is the same for me, you could leave both (and the constructor calls the inner method). I think it is more about how you feel the library should work more than our way of interact with it.
Summarizing, I'm ok with both!
@iSarabjitDhiman, it's the same for me, it's useful however you choose to implement it.
Hey @fpmirabile @codilau Just added those two dataclasses in the most recent commit 03ce45d Feel free to test it out and let me know if there are any changes to be made.
Check the docs here. Its in util.py module btw.
Edit : Assuming everything is working as intended, I am closing this issue now.
find some errors: how to skip it? description_urls: list[dict] = field(default_factory=list) TypeError: 'type' object is not subscriptable
find some errors: how to skip it? description_urls: list[dict] = field(default_factory=list) TypeError: 'type' object is not subscriptable
Hey @python502 Could u please give me some details? Which type of tweets are you passing? Tweets from the profile? Individual tweet? Media tweets?
Just tell me which method u used to fetch tweet's data?
get_user_tweets, get_tweet? Or some other
Let me know, thanks.
Edit : Already fixed in https://github.com/iSarabjitDhiman/TweeterPy/commit/b97c8c32f28c995eeb0fcc7b97e1610aace77ecb Duplicate : #34
I'd be a good idea to have a library ~native~ data class, for example User or Twitt
That will help in not be drawn by the heavy payload the graphql endpoints has
For example I've been working on these two:
and