iSarabjitDhiman / TweeterPy

TweeterPy is a python library to extract data from Twitter. TweeterPy API lets you scrape data from a user's profile like username, userid, bio, followers/followings list, profile media, tweets, etc.
MIT License
138 stars 20 forks source link

how to customize json result? - Navigating through json response #13

Closed ihabpalamino closed 1 year ago

ihabpalamino commented 1 year ago

Hello @iSarabjitDhiman i would like to ask how can i customize the json result to get only id content date of ceation number of likes and retweet and url of the post intead of the whole thing thanks!!

iSarabjitDhiman commented 1 year ago

If you are using the async-await branch, use the following (also available for the master branch now, check the code below) :

from tweeterpy import TweeterPy
from tweeterpy import util

twitter = TweeterPy()
# get tweets or other data
data = twitter.get_user_tweets("elonmusk",total=50)

# get data by keys from the nested python dict, just pass in the dataset and the key name you want to extract.

# NOTE : THERE MIGHT BE MULTIPLE KEYS WITH THE SAME NAME. SAY ID (IT COULD BE ID OF TWEET OR A USER OR A THREAD CONVERSATION ETC. ) TRY TO PASS A UNIQUE KEY, OR JUST PASS A DATASET WITH UNIQUE KEYS.
usernames = util.find_nested_key(data,"screen_name")

If you are using the master branch. Just define the following code in your project and use it as a normal function.

#> Its available now for the master branch as well. Import it from tweeterpy.utils module. Check the code below for more details.
from functools import reduce

def find_nested_key(dataset=None, nested_key=None):
    def get_nested_data(dataset, nested_key, placeholder):
        if isinstance(dataset, list) or isinstance(dataset, dict) and dataset:
            if isinstance(dataset, list):
                for item in dataset:
                    get_nested_data(item, nested_key, placeholder)
            if isinstance(dataset, dict):
                if isinstance(nested_key, tuple) and nested_key[0] in dataset.keys():
                    placeholder.append(reduce(lambda data, key: data.get(
                        key, {}), nested_key, dataset) or None)
                    placeholder.remove(None) if None in placeholder else ''
                    # placeholder.append(reduce(dict.get,nested_key,dataset))
                if isinstance(nested_key, str) and nested_key in dataset.keys():
                    placeholder.append(dataset.get(nested_key))
                for item in dataset.values():
                    get_nested_data(item, nested_key, placeholder)
        return placeholder
    return get_nested_data(dataset, nested_key, [])

tweets_text = find_nested_key(data,"full_text")

Edit : You don't have to do it manually anymore. It has been implemented to the master branch as well.

from tweeterpy import TweeterPy
from tweeterpy.util import find_nested_key

data = twitter.get_user_tweets("elonmusk",total=50)
usernames = util.find_nested_key(data,"screen_name")
tweets_text = find_nested_key(data,"full_text")

# Just updated find_nested_key function to accept nested_key as a tuple as well.
tweets_creation = util.find_nested_key(data,("tweet_results","result","legacy","created_at"))
ihabpalamino commented 1 year ago

thank you for your hard if i can ask you how much tweets can i scrap by month and a second question when i use pip install i install from the master branch how can i switch to the async-await branch?

iSarabjitDhiman commented 1 year ago

thank you for your hard if i can ask you how much tweets can i scrap by month and a second question when i use pip install i install from the master branch how can i switch to the async-await branch?

To check the rate limits, just use the async-await branch and pretty much all of the functions have an argument "return_rate_limit". Just set it to True. Take a look at this #8 .It will return the hourly rate limits, you can also google twitter api rate limits and you will get an idea of the requests you can make a day.

NOTE : If the rate limit is like 2000 per day that doesnt mean you can get only 2000 tweets a day. It means you can make 2000 requests a day. The data you can get depends on the type of data you are requesting for. Say if you are requesting for user_data, each request returns data for each user so that will be 2000 users in this case. But in case of tweeets, sometimes each request returns 30-50 or other times it does return like 100 tweets. So its better you keep an eye on those rate limits. The best way is to make a request and then request for the api limit stats to check how many requests did it cost.

# to check rate limits for user friends.

twitter.get_friends('',follower=True,return_rate_limit=True)

# to check limits for user tweets.

twitter.get_user_tweets('',return_rate_limit=True)

# it will return the total number of api calls allowed and the remaining api calls. You can get an idea from there.

Check this guide to switch to the async-await branch.

Feel free to close the issue if you got what you were looking for.

ihabpalamino commented 1 year ago

using createdat=util.find_nested_key(user_tweets,"created_at") ['Wed Sep 30 19:02:27 +0000 2009', 'Thu Jul 27 09:21:51 +0000 2023', i got date of creation of post and date of creation of the account how can i get only date of creation of post?

iSarabjitDhiman commented 1 year ago

As I mentioned earlier, there might be multiple keys with the similar name in a single dataset, "created_at" key is used for the users and also for the tweets as well. You can just use a for loop. The nested location of creation_at for tweets is at ['content']['itemContent']['tweet_results']['result']['legacy'].

So a quick fix in your case is:

user_tweets = twitter_user_tweets("elonmusk",total=20)

[util.find_nested_key(tweet['content']['itemContent']['tweet_results']['result']['legacy'],"created_at") for tweet in user_tweets[0]['data']]

Edit : Just updated find_nested_key function to accept nested_key as a tuple as well. So the updated/better solution is:

user_tweets = twitter_user_tweets("elonmusk",total=20)

tweets_creation = util.find_nested_key(user_tweets,("tweet_results","result","legacy","created_at"))
ihabpalamino commented 1 year ago

As I mentioned earlier, there might be multiple keys with the similar name in a single dataset, "created_at" key is used for the users and also for the tweets as well. You can just use a for loop. The nested location of creation_at for tweets is at ['content']['itemContent']['tweet_results']['result']['legacy'].

So a quick fix in your case is:

user_tweets = twitter_user_tweets("elonmusk",total=20)

[util.find_nested_key(tweet['content']['itemContent']['tweet_results']['result']['legacy'],"created_at") for tweet in user_tweets[0]['data']]

if is it possible to know where is the data list or dictionnary that contains all the keys just to know for example what is the keys to use to have only the url of the post or id of the post . and thanks for being up to date

iSarabjitDhiman commented 1 year ago

As I mentioned earlier, there might be multiple keys with the similar name in a single dataset, "created_at" key is used for the users and also for the tweets as well. You can just use a for loop. The nested location of creation_at for tweets is at ['content']['itemContent']['tweet_results']['result']['legacy']. So a quick fix in your case is: user_tweets = twitter_user_tweets("elonmusk",total=20) [util.find_nested_key(tweet['content']['itemContent']['tweet_results']['result']['legacy'],"created_at") for tweet in user_tweets[0]['data']]

if is it possible to know where is the data list or dictionnary that contains all the keys just to know for example what is the keys to use to have only the url of the post or id of the post . and thanks for being up to date

Hey @ihabpalamino

You can take a look at the official Twitter API website if they have posted some sample responses. Otherwise you gonna have to navigate through the response yourself to understand those key, values pairs. Just grab one of the results from the list, other results are quite similar most of the times.

I just updated the find_nested_key function. Now it takes nested_key as a tuple as well. Its easier this way to deal with multiple similar keys. Check the usage here Let me know if this is what you wanted.

ihabpalamino commented 1 year ago

thanks but stay cant get url of each post that direct me to see the tweet

iSarabjitDhiman commented 1 year ago

thanks but stay cant get url of each post that direct me to see the tweet

Twitter doesn't send the direct url to tweets in this dataset. You will have to create it on your own.

The tweets url structure is : https://www.twitter.com/username/status/tweet_id


user_tweets = twitter.get_user_tweets("elonmusk",total=10)

for user in user_tweets:
    for tweet in user["data"]:
        #skip promoted tweets
        if tweet.get("entryId","").startswith("promote"):
            continue
        tweet_id = util.find_nested_key(tweet,("tweet_results","result","rest_id"))
        username = util.find_nested_key(tweet,("user_results","result","legacy","screen_name"))
        if tweet_id and username:
            print(f"https://www.twitter.com/{username[0]}/status/{tweet_id[0]}")
iSarabjitDhiman commented 1 year ago

Assuming this is what you were looking for, I am closing the issue.