woluxwolu / twint

MIT License
83 stars 34 forks source link

Fix search #14

Closed 9ary closed 3 months ago

9ary commented 1 year ago

Search now requires being logged in + a CSRF token.

This PR adds a CLI flag to provide an authentication cookie (must be obtained by logging in with a browser, in Firefox the cookie can be found in the developer toolbox under the storage tab).

It looks like a randomly generated CSRF token works, so no complicated mechanism is required to obtain one.

Fixes #11. Fixes #13.

bb010g commented 1 year ago

Tests are now capable of passing on this branch. The first two commits (including https://github.com/woluxwolu/twint/pull/8) take care of fixing bugs that already prevented tests from working, independently of Twitter's latest changes.

LinqLover commented 1 year ago

That sounds great! Can you tell how stable this solution will be if you run twint regularly on a daily basis, i.e., how fast will the token/cookie expire? Will it work when you use the token/cookie to run twint on a different machine/IP address?

9ary commented 1 year ago

Can you tell how stable this solution will be if you run twint regularly on a daily basis, i.e., how fast will the token/cookie expire?

No idea yet, but we run a twint job every 12 hours on github actions (https://github.com/catgirl-v/cubari/actions), so we'll find out soon enough.

That sounds great! Can you tell how stable this solution will be if you run twint regularly on a daily basis, i.e., how fast will the token/cookie expire? Will it work when you use the token/cookie to run twint on a different machine/IP address?

It's working so far.

leonardoulloa21 commented 1 year ago

Is working for you guys? In my case this error is popping up, any advice?

"ConnectionError: Access forbidden, try passing --auth-token."

9ary commented 1 year ago

Yes, it's working. I'm gonna need more details to help you. Did you in fact pass a valid authentication cookie as per the op? If so, please post minimum example that reproduces the problem.

leonardoulloa21 commented 1 year ago

Do I need to pass a valid authentication cookie, how so? I just use the changes in this pr and try to execute my previous code the that error message popped up. How can I do what you recommed?

9ary commented 1 year ago

Sounds to me like you didn't read any of the conversation in #13 and here. The error message is very clear, you need an auth token. This is the whole point of this PR: Twitter now requires login to search. Instructions are in the op.

luxoflux commented 1 year ago

Brilliant solution, works just fine. Thanks.

leonardoulloa21 commented 1 year ago

Sounds to me like you didn't read any of the conversation in #13 and here. The error message is very clear, you need an auth token. This is the whole point of this PR: Twitter now requires login to search. Instructions are in the op.

My bad, I though that csrf_token = random.randbytes(16).hex() was it but I need to replace it with my auth token witch I get from Firefox browser, right? because I did make the change and I'm still having the same error ("ConnectionError: Access forbidden, try passing --auth-token."). Maybe am I doing something wrong? Some help would be nice please :)

9ary commented 1 year ago

No, you don't have to modify the code. Pass the token with the --auth-token flag, or set the TWITTER_AUTH_TOKEN environment variable.

CSRF is unrelated, it's just that both changes were required to actually get it to work.

leonardoulloa21 commented 1 year ago

No, you don't have to modify the code. Pass the token with the --auth-token flag, or set the TWITTER_AUTH_TOKEN environment variable.

CSRF is unrelated, it's just that both changes were required to actually get it to work.

I have my code implemented in AWS Lambda with twint's library as a layer. I update the lib and set the env variable as mentioned but I still having the same error. Locally, I'm getting the same result, if you could I would love to have some help :)

[CRITICAL] 2023-05-12T20:53:44.334Z 38205fb9-65a5-41b2-b6ce-377909a1b4e3 twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs [CRITICAL] 2023-05-12T20:53:45.425Z 38205fb9-65a5-41b2-b6ce-377909a1b4e3 twint.run:Twint:Feed:noData'data' sleeping for 8.0 secs [CRITICAL] 2023-05-12T20:53:53.524Z 38205fb9-65a5-41b2-b6ce-377909a1b4e3 twint.run:Twint:Feed:noData'data' sleeping for 27.0 secs

batmanscode commented 1 year ago

Thank you for the fix @9ary, works great! 😃

Tiny request, is it possible to add a wait time to prevent rate limits?

Looks like --min-wait-time is supposed to be automatically adjusted but I still get TokenExpiryException: Rate limit exceeded

ap.add_argument("--min-wait-time", type=float, default=15,
                    help="specifiy a minimum wait time in case of scraping limit error. This value will be adjusted by twint if the value provided does not satisfy the limits constraints")
9ary commented 1 year ago

For what it's worth, it seems the owner of this repo is inactive, so this PR is unlikely to be merged anytime soon. We've set up a fork at https://github.com/catgirl-v/twint.

@leonardoulloa21 @batmanscode please open issues over there with the code or command line invocation that reproduces your problems. It's not practical to do all development and troubleshooting in a single PR thread.

batmanscode commented 1 year ago

For what it's worth, it seems the owner of this repo is inactive, so this PR is unlikely to be merged anytime soon. We've set up a fork at https://github.com/catgirl-v/twint.

@leonardoulloa21 @batmanscode please open issues over there with the code or command line invocation that reproduces your problems. It's not practical to do all development and troubleshooting in a single PR thread.

Makes sense, thanks!

corpuzdonn commented 1 year ago

I replacd everything on the changes on the py files of my twint but i keep getting the ones below on all of my searches.

module 'random' has no attribute 'randbytes'

batmanscode commented 1 year ago

I replacd everything on the changes on the py files of my twint but i keep getting the ones below on all of my searches.

module 'random' has no attribute 'randbytes'

You have to use python 3.9 or above. It's mentioned in some of the early comments

corpuzdonn commented 1 year ago

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?

i'm very new at this. How do i pass the auth token?

batmanscode commented 1 year ago

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?

i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

corpuzdonn commented 1 year ago

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

Thanks it's working now.

leonardoulloa21 commented 1 year ago

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

Hey @batmanscode

Would you mind testing my code and tell me if you are getting the same error message?

I'm trying to run it in jupyternotebook and then in AWS Lambda.

`import twint import os import nest_asyncio

os.environ["TWITTER_AUTH_TOKEN"] = "my_token"

nest_asyncio.apply()

c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c)`

I'm getting this error: CRITICAL:root:twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs

Hope you can give a hand!

Thanks in advanced

JoelBird commented 1 year ago

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

I can only find Authentication tokens, and they're found in the developer portal, I didn't see any 'developer tools' or 'storage' on Firefox. Which of them Should I use?

9ary commented 1 year ago

@JoelBird hopefully this is detailed enough:

marquisvictor commented 1 year ago

Hi @9ary, thanks for the fix. But for now, using the command line, only the -u parameter works, the search parameter -s isn't work. Any idea why it isn't. I'm trying to debug it here.

I'm getting CRITICAL:root:twint.run:Twint:Feed:noData'data' with twint -s pineapple but twint -u username works fine

corpuzdonn commented 1 year ago

I'm having issues of Rate Limit exceeded? How do i fix this? what should i keep looping to override this?

batmanscode commented 1 year ago

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

Hey @batmanscode

Would you mind testing my code and tell me if you are getting the same error message?

I'm trying to run it in jupyternotebook and then in AWS Lambda.

`import twint import os import nest_asyncio

os.environ["TWITTER_AUTH_TOKEN"] = "my_token"

nest_asyncio.apply()

c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c)`

I'm getting this error: CRITICAL:root:twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs

Hope you can give a hand!

Thanks in advanced

I'm not sure, sorry. I'm having the same issue :(

a-annor commented 1 year ago

For those that this is working for, would someone be able run through

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN It should then run. Good luck

Hey @batmanscode Would you mind testing my code and tell me if you are getting the same error message? I'm trying to run it in jupyternotebook and then in AWS Lambda. import twint import os import nest_asyncio os.environ["TWITTER_AUTH_TOKEN"] = "my_token" nest_asyncio.apply() c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c) I'm getting this error: CRITICAL:root:twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs Hope you can give a hand! Thanks in advanced

I'm not sure, sorry. I'm having the same issue :(

Hi all, is this still working for anyone? I'm experiencing the same issue as @leonardoulloa21

corpuzdonn commented 1 year ago

For those that this is working for, would someone be able run through

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN It should then run. Good luck

Hey @batmanscode Would you mind testing my code and tell me if you are getting the same error message? I'm trying to run it in jupyternotebook and then in AWS Lambda. import twint import os import nest_asyncio os.environ["TWITTER_AUTH_TOKEN"] = "my_token" nest_asyncio.apply() c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c) I'm getting this error: CRITICAL:root:twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs Hope you can give a hand! Thanks in advanced

I'm not sure, sorry. I'm having the same issue :(

Hi all, is this still working for anyone? I'm experiencing the same issue as @leonardoulloa21

It's working fine on my end.

Output: 1672684481071976449 2023-06-25 03:13:51 +0800 @kvafelled @kvafelled se debe esperar el plazo de 15 días calendario aproximadamente. 1672663630951878656 2023-06-25 01:51:00 +0800 @kvafelled Hola, @kvafelled 👋 Te contamos que cuando se realiza una cancelación, anulación o reembolso de compra por parte de alguna empresa, estas tienen hasta 15 días (calendario) para proceder con la devolución del dinero a tu cuenta de ahorros. 🤝 ... ... ... @Maverick99210 ¡Hola, @Maverick99210! 👋 Lamentamos el inconveniente generado, por favor, envíanos tu DNI vía DM para poder orientarte de la mejor manera. Esperamos tu mensaje. 1668754764476346368 2023-06-14 06:58:34 +0800 @mendezt_29 Hola @mendezt_29 Envíanos un DM con la captura de pantalla de lo que te aparece y el número de tu DNI. Quedamos atentos. 1668710970741628928 2023-06-14 04:04:33 +0800 @PALICUYA ¡Hola Pilar!👋 Por favor envíanos un inbox con tu DNI y la imagen que te aparece aquí 👉🏻 https://t.co/HE00YFfJez. Quedamos atentos. 🤝 1668690239685267457 2023-06-14 02:42:10 +0800 @vladineitor Elsa, gracias por la información. Estamos reportando lo sucedido al equipo a cargo, para que se pueda hacer las consultas y verificaciones al respecto. Lamentamos mucho la molestia 1668687790408884224 2023-06-14 02:32:26 +0800 @vladineitor Hola, Elsa. Queremos conocer lo ocurrido. Por favor, detállanos vía DM el inconveniente presentado y la ubicación de la Agencia (avenida/calle/número/alguna referencia). Quedamos atentos. 1668631670076116997 2023-06-13 22:49:26 +0800 @RodrigoVinyas ¡Hola, Rodrigo! 👋 Nos importa mucho la experiencia de cada uno de nuestros clientes, agradeceríamos que puedas comunicarte al 01 311 9400 de L-V de 7:00 a.m. a 5:00 p.m. con nuestra área de Soluciones de Pagos, para solicitar alguna facilidad de pago o un compromiso de pagos. [!] No more data! Scraping will stop now. found 0 deleted tweets in this search.

batmanscode commented 1 year ago

Thanks @corpuzdonn, maybe it's a token issue from my end

I attempted a huge scrape (4 weeks via search terms) and that got rate limited. Maybe that token wasn't valid after that

Have you tried long scrapes? I saw there's a time out parameter but even setting that very high didn't work for me

leonardoulloa21 commented 1 year ago

For those that this is working for, would someone be able run through

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN It should then run. Good luck

Hey @batmanscode Would you mind testing my code and tell me if you are getting the same error message? I'm trying to run it in jupyternotebook and then in AWS Lambda. import twint import os import nest_asyncio os.environ["TWITTER_AUTH_TOKEN"] = "my_token" nest_asyncio.apply() c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c) I'm getting this error: CRITICAL:root:twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs Hope you can give a hand! Thanks in advanced

I'm not sure, sorry. I'm having the same issue :(

Hi all, is this still working for anyone? I'm experiencing the same issue as @leonardoulloa21

It's working fine on my end.

Output: 1672684481071976449 2023-06-25 03:13:51 +0800 @kvafelled @kvafelled se debe esperar el plazo de 15 días calendario aproximadamente. 1672663630951878656 2023-06-25 01:51:00 +0800 @kvafelled Hola, @kvafelled 👋 Te contamos que cuando se realiza una cancelación, anulación o reembolso de compra por parte de alguna empresa, estas tienen hasta 15 días (calendario) para proceder con la devolución del dinero a tu cuenta de ahorros. 🤝 ... ... ... @Maverick99210 ¡Hola, @Maverick99210! 👋 Lamentamos el inconveniente generado, por favor, envíanos tu DNI vía DM para poder orientarte de la mejor manera. Esperamos tu mensaje. 1668754764476346368 2023-06-14 06:58:34 +0800 @mendezt_29 Hola @mendezt_29 Envíanos un DM con la captura de pantalla de lo que te aparece y el número de tu DNI. Quedamos atentos. 1668710970741628928 2023-06-14 04:04:33 +0800 @PALICUYA ¡Hola Pilar!👋 Por favor envíanos un inbox con tu DNI y la imagen que te aparece aquí 👉🏻 https://t.co/HE00YFfJez. Quedamos atentos. 🤝 1668690239685267457 2023-06-14 02:42:10 +0800 @vladineitor Elsa, gracias por la información. Estamos reportando lo sucedido al equipo a cargo, para que se pueda hacer las consultas y verificaciones al respecto. Lamentamos mucho la molestia 1668687790408884224 2023-06-14 02:32:26 +0800 @vladineitor Hola, Elsa. Queremos conocer lo ocurrido. Por favor, detállanos vía DM el inconveniente presentado y la ubicación de la Agencia (avenida/calle/número/alguna referencia). Quedamos atentos. 1668631670076116997 2023-06-13 22:49:26 +0800 @RodrigoVinyas ¡Hola, Rodrigo! 👋 Nos importa mucho la experiencia de cada uno de nuestros clientes, agradeceríamos que puedas comunicarte al 01 311 9400 de L-V de 7:00 a.m. a 5:00 p.m. con nuestra área de Soluciones de Pagos, para solicitar alguna facilidad de pago o un compromiso de pagos. [!] No more data! Scraping will stop now. found 0 deleted tweets in this search.

Would you mind packling up your twint library and share it to us, please! I might be doing something wrong because I have just tried it and I got the same result :

CRITICAL:root:twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs

I don't think that this message is related to the auth token, it has to be something else... Thanks in advanced for your time @woluxwolu

corpuzdonn commented 1 year ago

I am actually getting the following below all of a sudden. Did something change?

CRITICAL:root:twint.get:User:Expecting value: line 1 column 1 (char 0)

9ary commented 1 year ago

Most likely the latest Twitter changes require more API calls to be authenticated. Our scripts broke too but I'm currently on vacation. I'll have a look in a few days.

corpuzdonn commented 1 year ago

Has there been any updates. Idk if there was but my output has become:

CRITICAL:root:twint.run:Twint:Feed:noData'globalObjects'

9ary commented 1 year ago

The search endpoint returns 404, it looks like they've finally killed it off. This means twint will need to be reworked to use the graphql API, which is a lot more work than I'm willing to put in personally.

corpuzdonn commented 1 year ago

The search endpoint returns 404, it looks like they've finally killed it off. This means twint will need to be reworked to use the graphql API, which is a lot more work than I'm willing to put in personally.

I see. It's ok. Will find alternative solutions. Thanks for your hard work!