vladkens / twscrape

2024! X / Twitter API scrapper with authorization support. Allows you to scrape search results, User's profiles (followers/following), Tweets (favoriters/retweeters) and more.
https://pypi.org/project/twscrape/
MIT License
939 stars 120 forks source link

Add get_id_from_username() method #26

Closed reddere closed 1 year ago

reddere commented 1 year ago

Please, add a method that allows getting the ID from a username. Otherwise, we would manually search and convert those usernames to IDs. Thanks.

vladkens commented 1 year ago

Hi, @reddere. You can use user_by_login method and get id from User object. Do this cover what you need?

user = await api.user_by_login("twitter")
print(user.id, user.id_str)
reddere commented 1 year ago

thanks @vladkens ! Here's 2 more things I would consider bugs:

  1. What about gather(api.user_tweets(tweeter_id, limit=1)) not returning 1 result? I am actually getting way more than one result. What is this due to?

  2. the Tweet object does not return the full tweet text, take this one for example: https://twitter.com/FortniteStatus/status/1673714760473346048. the rawContent ends with 3 dots (apart for the tweet link), which just cuts the full text. What can I/you do to fix it?

Thanks in advance!

vladkens commented 1 year ago
  1. This is because twitter api return data in batches, like 20 tweets per request. Limit param stop iteration when total count of received tweets more that limit. So like limit=21, on first call we receive 20 tweets -> do next call -> +20 tweets (40 total received). Currently twscrape return all of this tweets. Maybe for not _raw methods need to filter count in code, and stop iteration. But not sure about this, more data is not less data.

  2. Good catch. Need to update parser to support long messages.

reddere commented 1 year ago
  1. Sure! So what is the limit referring to? The number of batches?
  2. Great. If you need any help to implement that, let me know!
  3. Also, is the translate button translation in the works? Translating with a normal translating library makes the translation so bad

@vladkens

vladkens commented 1 year ago

Hi, @reddere.

  1. Limit is number of objects (tweets or users depend of methods used). But now limit can return more objects, if api returns more object. Limit tries to return NOT LESS object than requested. I will think to add some flag how to manage this behaviour, in case if you need exact N objects.
  2. Implemented in latest release, upgrade please: pip install --upgrade twscrape
  3. Nope, this feature is not implemented. I know that twitter have some api for this. If you need this, create new issue please.
reddere commented 1 year ago

Great @vladkens !

  1. All right! To me, if returning an exact N of object will not change the amount of limited request I can make, then it still works fine to me as it is right now.
  2. Updated and thanks!
  3. Sure, doing this shortly.

Lastly, ever thought about creating a community chat like a discord or telegram group? It would massively benefit everyone!

vladkens commented 1 year ago

Hi. Limit fixed in v0.6.0. To update use: pip install --upgrade twscrape

Thank you for issues

reddere commented 1 year ago

Thanks @vladkens ! So now, what would we need to do to keep it as it was? If setting the limit won't change my rate limit, it would be more convenient to get the whole batches of tweets

reddere commented 1 year ago

Also, feedback on 0.6.0: I had to downgrade to 0.5.0 as I wouldn't check the mail when logging in, for many logins I dont want to check the mail. I would also add a way to keep the original functionality of api.user_tweets(), keeping the original full batches

@vladkens

vladkens commented 1 year ago

@reddere please create new tickets if you have questions.

login and batches restored in v0.7.0