Open apentori opened 1 week ago
According to the documentation of the endpoint /2/users/:id/tweets
, the query should accept the following parameters:
endtime
: Most recent part of the interval of time in which the tweets are queried.start_time
: Oldest part of the interval of time in which the tweets are queried.pagination_token
: allow to get next page.max_results
: number of result per query.In order to get all the tweet history, the connector should have the pagination enabled, this will make the connector fetch all the tweet in one execution.
This is great but it will be long and might consum all the API rate limit twitter documentation is not clear on the Rate limiting ,either 1500 request per 15 min or 15 per 15 min with a limit of 10,000 per 30 days for basic plan and 450 per 15 min aith a limit of 1 296 000 per 30d.( see GET_2_tweets
in https://developer.x.com/en/docs/twitter-api/rate-limits#v2-limits-basic and
https://developer.x.com/en/docs/twitter-api/rate-limits#v2-limits-pro - biggest number are found in https://developer.x.com/en/docs/twitter-api/tweets/timelines/api-reference/get-users-id-tweets but seems to be false based on the Rate limiting of the current setup ).
Fetching all tweet history everyday for all account will for sure make us reach the 30d API limit and probably will make the connector run for multiple hours in order to fetch all tweet of some accounts.
In order to get all history and still run normally the connector the following should be implemented:
start_time
to the connector to limit the period of sync for each day to day run. We can add a boolean setting to remove not include the start_time if we need to fetch the history another time.For the promoted metrics, there's a section in this link on a sample request.
curl 'https://api.twitter.com/2/tweets/1204084171334832128?tweet.fields=non_public_metrics,organic_metrics&media.fields=non_public_metrics,organic_metrics&expansions=attachments.media_keys'
--header 'authorization: OAuth oauth_consumer_key="CONSUMER_API_KEY", oauth_nonce="OAUTH_NONCE", oauth_signature="OAUTH_SIGNATURE", oauth_signature_method="HMAC-SHA1", oauth_timestamp="OAUTH_TIMESTAMP", oauth_token="ACCESS_TOKEN", oauth_version="1.0"'
Error when trying to implement the start_time
limit:
Caused by: io.temporal.failure.ApplicationFailure: message='Integration failed to output a spec struct and did not output a failure reason', type='io.airbyte.workers.exception.WorkerException', nonRetryable=false
at io.airbyte.workers.WorkerUtils.throwWorkerException(WorkerUtils.java:269) ~[io.airbyte-airbyte-commons-worker-0.55.0.jar:?]
at io.airbyte.workers.general.DefaultGetSpecWorker.run(DefaultGetSpecWorker.java:78) ~[io.airbyte-airbyte-commons-worker-0.55.0.jar:?]
at io.airbyte.workers.general.DefaultGetSpecWorker.run(DefaultGetSpecWorker.java:36) ~[io.airbyte-airbyte-commons-worker-0.55.0.jar:?]
at io.airbyte.workers.temporal.TemporalAttemptExecution.get(TemporalAttemptExecution.java:142) ~[io.airbyte-airbyte-workers-0.55.0.jar:?]
at io.airbyte.workers.temporal.spec.SpecActivityImpl.lambda$run$2(SpecActivityImpl.java:179) ~[io.airbyte-airbyte-workers-0.55.0.jar:?]
at io.airbyte.commons.temporal.HeartbeatUtils.withBackgroundHeartbeat(HeartbeatUtils.java:57) ~[io.airbyte-airbyte-commons-temporal-core-0.55.0.jar:?]
at io.airbyte.workers.temporal.spec.SpecActivityImpl.run(SpecActivityImpl.java:163) ~[io.airbyte-airbyte-workers-0.55.0.jar:?]
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[?:?]
at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) ~[temporal-sdk-1.22.3.jar:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.base/java.lang.Thread.run(Thread.java:1583) ~[?:?]
The connector doesn't like using strftime function to convert date to string.
"start_time": self.start_time.strftime("%Y-%m-%d%H:%M:%SZ")}
Letting the connectors run without time limit didn't fetch all the tweet history, for example for the account statuseth
, we only got 27 tweets before the endpoint returned an empty response:
{'meta': {'next_token': '7140dibdnow9c7btw482mq8sqdarz7m9kb6zo92i8wo51', 'previous_token': '77qpymm88g5h9vqkluxex6at4ibn4hol7dahowoza9u0g', 'result_count': 0}}
Returns Tweets composed by a single user, specified by the requested user ID. By default, the most recent ten Tweets are returned per request. Using pagination, the most recent 3,200 Tweets can be retrieved.
According to the documentation we should get more than 27....
https://developer.x.com/en/docs/twitter-api/tweet-caps
Basic tier 10,000 Posts per month
If the same Post is returned from multiple queries during a day, then the Post is only counted once against the post cap - i.e, the Posts are deduplicated.
The tweet caps should limit use to this small amount
Description
For the moment, the connector only fetch the information of the last tweet. We need to fetch all the tweet history at least once.