parroty / extwitter

Twitter client library for elixir.
MIT License
409 stars 129 forks source link

No metadata for search results #84

Open Fallenstedt opened 7 years ago

Fallenstedt commented 7 years ago

Twitter's API allows you to receive metadata for search results.

Using ExTwitter.search I am able to search for tweets. As an example, I can search for 120 tweets about pizza near Portland with:

  def search(topic, count, radius) do
    options = [
      count: count,
      lang: "en",
      geocode: "45.5231,-122.6765,#{radius}mi",
      result_type: "recent"
    ]

    ExTwitter.search("#{topic}", options)
    |> IO.inspect
  end

Running this in console, we can see that the amount of tweets returned is 100 with a = MyModule.search("pizza", 120, 500) |> Enum.count

No where in this list is any search metadata that includes a next_results token for me to obtain the next twenty tweets. In the Twitter API, we should have meta data that might look like this

  "search_metadata": {
    "max_id": 250126199840518145,
    "since_id": 24012619984051000,
    "refresh_url": "?since_id=250126199840518145&q=%23freebandnames&result_type=mixed&include_entities=1",
    "next_results": "?max_id=249279667666817023&q=%23freebandnames&count=4&include_entities=1&result_type=mixed",
    "count": 4,
    "completed_in": 0.035,
    "since_id_str": "24012619984051000",
    "query": "%23freebandnames",
    "max_id_str": "250126199840518145"
  }

My suspicion is that when the results are parsed, we are excluding this metadata.. I've forked this library and tested searching without parsing, and I have access to this metadata.

Is there currently a way we can parse the json to include the search_metadata? If not, how can I contribute? I would love to have a feature that allows me to page through my data, because right now I am locked to only getting 100 results when I may need thousands.

parroty commented 7 years ago

Thanks for the comment. As you indicated, currently metadata is excluded while parsing the results for search API.

I'm starting to wonder if certain meta-option can be added to allow access to the search_metadata and provide certain helper to fetch next page from the previous result.

If you have any opinions regarding interface, I appreciate if could share some (I'll be thinking some more).

prev_response = ExTwitter.search("pizza", [count: 100, search_metadata: true])
response = ExTwitter.search_next_page(prev_response.metadata)
defmodule Searcher do
  def search_next_page(prev_response, index) do
    IO.puts("Fetching page " <> to_string(index))
    response = ExTwitter.search_next_page(prev_response.metadata)
    if response != nil do
      prev_response.statuses ++ search_next_page(response, index + 1)
    else
      prev_response.statuses
    end
  end
end

response = ExTwitter.search("pizza", [count: 100, search_metadata: true])
tweets = Searcher.search_next_page(response, 1)
Fallenstedt commented 7 years ago

I enjoy this interface idea. My design was going to include a struct for search_metadata and include it in response as the last item in that list.

Then Searcher.search_next_page could use this struct, which I would hope to be the last item in the response list, to fetch the next page of results.

One issue I have been having is using the search_metadata url to fetch the next page of data. I keep trying various ways to use it, however, I keep failing OAuth. I'll put together a more concrete idea later today and retrace my steps.

gmile commented 7 years ago

What about fetching other things that require paging?

For instance right now it's not possible not fetch beyond 200 items (this is per Twitter API design) when fetching favorited tweets:

length(ExTwitter.favorites(count: 1000))
# => 200

This means it's technically possible to fetch all favorites, but would require to do this manually.

@parroty would your proposed design cover this case as well, or is it only suited for search?

parroty commented 7 years ago

@Fallenstedt Thanks for the comment and sorry being late to respond.

I enjoy this interface idea. My design was going to include a struct for search_metadata and include it in response as the last item in that list.

As current ExTwitter.search directly returns list of tweets, it gets tricky to add this metadata. I'm thinking to switch response type (list or struct) by search_metadata option. It might not be best way, but the following is a branch for fix trial.

https://github.com/parroty/extwitter/pull/86

@gmile Thanks for the comment. If writing iterative logic (like the above example) is acceptable, I think the code like the following would correspond to the favorites case. Search API depends on search_metadata for paging, but current ExTwitter.search is not returning the response (which is the original issue comment).

defmodule FavoritesSearcher do
  def run(options) do
    do_run(options, 1)
  end

  defp do_run(options, index) do
    favorites = ExTwitter.favorites(options)
    IO.puts("Fetched page " <> to_string(index) <> " with " <> to_string(Enum.count(favorites)) <> " tweets by max_id " <> to_string(Keyword.get(options, :max_id, nil)))
    if Enum.count(favorites) > 0 do
      favorites ++ do_run(Keyword.merge(options, [max_id: List.last(favorites).id - 1]), index + 1)
    else
      favorites
    end
  end
end

favorites = FavoritesSearcher.run(screen_name: "justinbieber", count: 200)
gmile commented 7 years ago

@parroty nice, thanks for the code excerpt! I'll try and rely my implementation on it.

Generally, do you think extwitter could benefit from including this excerpt, probably some generic form of it, to the core code?

I'm trying to think of cases where manually pulling additional items is appropriate. When a user would want to run his code inbetween API calls done by extwitter?

I think what really matters for end user, e.g. his intention is, to just get the N results he requested (be it 5, 200 or 1000) and not leverage a pull-check-pull-again mechanism by himself.

From looking at the reference, I see that different GET calls rely on different page_max value (100, 200, 800). I think all such calls could benefit from automatic pulling more results beyond page_max value.