SPIKE: Research Twitter API Endpoints for Tweet Retrieval and Keyword Search

Luka-Loncar commented 5 months ago

Spike story

As a Masa Protocol developer, I want to research the Twitter API endpoints for tweet retrieval and keyword search So that we can gather the necessary information to plan the integration of these endpoints into the Masa worker nodes.

Acceptance Criteria:

Review the Twitter API documentation and identify the specific endpoints for tweet retrieval and keyword search. Understand the authentication requirements, API limits, and data formats associated with these endpoints. Investigate the available libraries, SDKs, or tools that can facilitate the integration of these endpoints into the Masa worker nodes. Identify any potential challenges, limitations, or considerations that may impact the integration process.

Document the findings, including:

The specific endpoints and their functionalities
Authentication requirements and API limits
Recommended libraries, SDKs, or tools for integration
Potential challenges and considerations
High-level integration approach or architecture

Present the research findings to the development team and stakeholders for discussion and planning.

Definition of Done:

The spike research is completed within the allocated time box.
The specific Twitter API endpoints for tweet retrieval and keyword search are identified and documented.
The authentication requirements, API limits, and data formats associated with these endpoints are clearly understood and documented.
Relevant libraries, SDKs, or tools that can aid in the integration process are identified and documented.
Potential challenges, limitations, and considerations are identified and documented.
The research findings, including the high-level integration approach or architecture, are clearly documented and presented to the team.
The development team and stakeholders have the necessary information to plan the integration of the identified endpoints into the Masa worker nodes.

Note: This spike user story focuses solely on the research aspect of integrating the Twitter API endpoints into the Masa worker nodes. The actual implementation and integration will be covered in separate user stories based on the outcome of this research spike.

jdutchak commented 5 months ago

Using the developer pro account for Twitter/X here are the docs

https://developer.x.com/en/docs/twitter-api/tweets/lookup/introduction

We can write our own REST API Integration / SDK no others needed, this is a very basic process.

Retrieve up to 1M per month
3 environments
Cost $5000.00 USD/month

curl 'https://api.twitter.com/2/usage/tweets' --header 'Authorization: Bearer XXXXX'

{
    "data": {
        "cap_reset_day": 10,
        "project_cap": "5000000000",
        "project_id": "1369785403853424",
        "project_usage": "43435"
    }
}

teslashibe commented 5 months ago

@jdutchak let's say we want to allow workers to bring their own API keys in the future. Would the API scale like other workers? Thinking out loud here that we should follow the same pattern. What do you think? Would love to know more about how you envisioned implementing and what the criteria would be for an oracle node (staked non-worker) to query the API through the network. I.e. "And oracle node would need a direct connection as a peer to a bootnode to query the data"

Hope this makes sense

jdutchak commented 5 months ago

@teslashibe, there are a couple scenarios in play.

Firstly, we can have Twitter scraping implemented through our bootnodes, which provide us with the necessary Pro API Key and Bearer Token. This is similar to how we integrate with Cloudflare LLM workers.

Secondly, we allow node operators to run their own nodes using their own API/token. By doing this, they can contribute to the network and be rewarded for providing this service.

This setup allows us to have functional workers for node users when connected to our bootnodes.

The question then arises: should we bear the cost of these operations, or should we also find a way to monetize from those who only scrape data but do not contribute to the "work"?

teslashibe commented 5 months ago

@jdutchak good thoughts here. Lets get the current work with the Validator (writer) over the line then we can circle back on this 👍

I think that having both options makes a lot of sense - we can bootstrap and then others can monetize their existing API integration through a worker

I think we will need to bootstrap the cost in v1 then work to scale Twitter API workers

jdutchak commented 5 months ago

seems we are already supporting this on twitters capabilities

curl -X 'POST' \
  'http://localhost:8080/api/v1/data/twitter/tweets/recent' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "query": "Masa (from:getmasafi)",
  "count": 10
}'

curl -X 'POST' \
  'http://localhost:8080/api/v1/data/twitter/tweets/recent' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "query": "Bitcoin until:2022-01-01 since:2006-01-01",
  "count": 10
}'

{
  "query": "Masa (from:getmasafi) until:2024-06-01 since:2023-01-01",
  "count": 5
}

Luka-Loncar commented 5 months ago

Problem is that exact match returns an error

mudler commented 4 months ago

@Luka-Loncar can you elaborate?

From @jdutchak comment looks like we already support this - shall we close this or open a bug ticket to track the specific error?

Luka-Loncar commented 4 months ago

I just wrote this during the call - I believe @teslashibe asked me to note this here.

masa-finance / masa-oracle

SPIKE: Research Twitter API Endpoints for Tweet Retrieval and Keyword Search #313