loklak / loklak_server

Distributed Open Source twitter and social media message search server that anonymously collects, shares, dumps and indexes data http://api.loklak.org
GNU Lesser General Public License v2.1
1.38k stars 223 forks source link

Store User's Twitter Data to Loklak #64

Closed prasht63 closed 8 years ago

prasht63 commented 9 years ago

Twitter does not allows you to get all data for the followers of a random user, so we can not show all followers of a person, a loggedin user can see his followers/following on the map but we can do this maybe through scraping data we can display that and scrape it step by step since we don't have location data for a random person's followers we plot whoever we have the data from and write down somewhere ("some users may not have location data associated with their account and will not show up on the map").So we save the data in loklak,As soon as we save it, others can search my followers/following too.Everything we take from twitter -> We need to save it in loklak.We need all the data from twitter to be saved in loklak! This way people can also follow in future, who unfollows accounts at one point. this is what http://who.unfollowed.me/ does for example they save all followers/following

Orbiter commented 9 years ago

I see several problems here:

There are other obstacles:

I suggest a different approach:

Orbiter commented 9 years ago

The rate limit of twitter is a big issue here! See https://dev.twitter.com/rest/reference/get/followers/list

Limitation Attribute
Rate limited? Yes
Requests / 15-min window (user auth) 15
Requests / 15-min window (app auth) 30

With each request, only 20 followers can be retrieved. If a twitter client retrieves 300 followers, the function is disabled for 15 minutes. It also does not help to use a different authentication of the users we stored, because the app can only do 30 lookups during 15 minutes. This limitation means essentially that it is almost completely unusable for us.

The same applies to the loklak_webclient, there the same limitation would be applied. However, implementing this now in the back-end would at least give a caching for already loaded followers.

mariobehling commented 9 years ago

I wonder how other services like tweepsmap.com and who.unfolled.me are doing it. They have the data within a few seconds after you registered.

Orbiter commented 9 years ago

It may depend on the size of the follower list. I also found a way to get much more follower IDs (instead of full user records), there you can get 15 * 5000 IDs. If you already have users assigned to such IDs, you can get more info. This is still limitied to 75000 followers which the service can take from twitter within 15 Minutes (not only for one user of the service, for all of them!)

Another answer on the question how other do that is: they are maybe incomplete, but they don't tell you.

Orbiter commented 9 years ago

once the IDs are received, it is possible to get 100 users at once with https://dev.twitter.com/rest/reference/get/users/lookup This can also be repeated only 15 times within 15 minutes. That means you get at most 1400 User accounts at once - one request gives you 5000 ids and from them you can take 100 each 14 times. Then the API throws an exception and you have to wait.

This can be enhanced a lot if you know some of the user accounts already. That must be the way how other servlces look like they are able to do more. Or they pay money to twitter...

Orbiter commented 9 years ago

One part of the job is done, loklak_server now takes the user profile data from twitter and stores it. Try:

http://localhost:9000/api/account.json?screen_name=mariobehling

This will give you extensive data about the account. To retrieve this data, first the OAuth key of the user is used. If this fails, the loklak app OAuth key is used.

This does not yet provide follower data, will provide this soon as well...

Here is the output from the call above:

{
  "search_metadata" : {
    "count" : "0",
    "client" : "0:0:0:0:0:0:0:1"
  },
  "accounts" : [ {
    "authentication_latest" : "2015-07-16T23:11:03.381Z",
    "authentication_first" : "2015-07-16T23:11:03.381Z",
    "source_type" : "TWITTER",
    "screen_name" : "mariobehling",
    "apps" : {
      "wall" : {
        "type" : "horizontal"
      }
    }
  } ],
  "user" : {
    "utc_offset" : 25200,
    "friends_count" : 390,
    "profile_image_url_https" : "https://pbs.twimg.com/profile_images/446123162/mb_normal.JPG",
    "listed_count" : 38,
    "profile_background_image_url" : "http://abs.twimg.com/images/themes/theme16/bg.gif",
    "default_profile_image" : false,
    "favourites_count" : 50,
    "description" : "Around the world with open hardware/networks/software and free knowledge @lubuntudesktop founder, @fossasia organizer, blogger @freifunk  #Mozilla Rep",
    "created_at" : "Tue May 27 10:36:25 +0000 2008",
    "is_translator" : false,
    "profile_background_image_url_https" : "https://abs.twimg.com/images/themes/theme16/bg.gif",
    "protected" : false,
    "screen_name" : "mariobehling",
    "id_str" : "14919253",
    "profile_link_color" : "0084B4",
    "is_translation_enabled" : false,
    "id" : 14919253,
    "geo_enabled" : true,
    "profile_background_color" : "9AE4E8",
    "lang" : "en",
    "has_extended_profile" : false,
    "profile_sidebar_border_color" : "BDDCAD",
    "profile_location" : {
      "country_code" : "",
      "country" : "",
      "contained_within" : [ ],
      "full_name" : "Berlin, Germany",
      "bounding_box" : null,
      "place_type" : "unknown",
      "name" : "Berlin, Germany",
      "attributes" : { },
      "id" : "3078869807f9dd36",
      "url" : "https://api.twitter.com/1.1/geo/id/3078869807f9dd36.json"
    },
    "profile_text_color" : "333333",
    "verified" : false,
    "profile_image_url" : "http://pbs.twimg.com/profile_images/446123162/mb_normal.JPG",
    "time_zone" : "Hanoi",
    "url" : "http://t.co/hOOUvvLp1S",
    "contributors_enabled" : false,
    "profile_background_tile" : false,
    "profile_banner_url" : "https://pbs.twimg.com/profile_banners/14919253/1431386332",
    "entities" : {
      "description" : {
        "urls" : [ ]
      },
      "url" : {
        "urls" : [ {
          "display_url" : "mariobehling.de",
          "indices" : [ 0, 22 ],
          "expanded_url" : "http://mariobehling.de",
          "url" : "http://t.co/hOOUvvLp1S"
        } ]
      }
    },
    "statuses_count" : 575,
    "follow_request_sent" : false,
    "followers_count" : 1322,
    "profile_use_background_image" : true,
    "default_profile" : false,
    "following" : true,
    "name" : "Mario Behling",
    "location" : "Berlin, Germany",
    "profile_sidebar_fill_color" : "DDFFCC",
    "notifications" : false,
    "retrieval_date" : "2015-07-16T23:10:25.945Z",
    "status" : {
      "in_reply_to_status_id_str" : null,
      "in_reply_to_status_id" : null,
      "possibly_sensitive" : false,
      "coordinates" : null,
      "created_at" : "Thu Jul 16 21:10:42 +0000 2015",
      "truncated" : false,
      "in_reply_to_user_id_str" : null,
      "source" : "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>",
      "retweet_count" : 3,
      "retweeted" : false,
      "geo" : null,
      "in_reply_to_screen_name" : null,
      "entities" : {
        "urls" : [ ],
        "hashtags" : [ {
          "indices" : [ 0, 6 ],
          "text" : "TOA15"
        }, {
          "indices" : [ 7, 14 ],
          "text" : "Berlin"
        }, {
          "indices" : [ 24, 35 ],
          "text" : "Zuperpower"
        }, {
          "indices" : [ 65, 70 ],
          "text" : "Java"
        } ],
        "media" : [ {
          "display_url" : "pic.twitter.com/1vV0H0qXbB",
          "indices" : [ 84, 106 ],
          "sizes" : {
            "small" : {
              "w" : 340,
              "h" : 453,
              "resize" : "fit"
            },
            "large" : {
              "w" : 768,
              "h" : 1024,
              "resize" : "fit"
            },
            "thumb" : {
              "w" : 150,
              "h" : 150,
              "resize" : "crop"
            },
            "medium" : {
              "w" : 600,
              "h" : 800,
              "resize" : "fit"
            }
          },
          "id_str" : "621789046478610433",
          "expanded_url" : "http://twitter.com/mariobehling/status/621789074395951108/photo/1",
          "media_url_https" : "https://pbs.twimg.com/media/CKEJ1-iWIAE7Lu7.jpg",
          "id" : 621789046478610433,
          "type" : "photo",
          "media_url" : "http://pbs.twimg.com/media/CKEJ1-iWIAE7Lu7.jpg",
          "url" : "http://t.co/1vV0H0qXbB"
        } ],
        "user_mentions" : [ {
          "indices" : [ 15, 23 ],
          "screen_name" : "Zalando",
          "id_str" : "59900894",
          "name" : "Zalando",
          "id" : 59900894
        }, {
          "indices" : [ 75, 83 ],
          "screen_name" : "0rb1t3r",
          "id_str" : "21938231",
          "name" : "Orbiter",
          "id" : 21938231
        } ],
        "symbols" : [ ]
      },
      "id_str" : "621789074395951108",
      "in_reply_to_user_id" : null,
      "favorite_count" : 2,
      "id" : 621789074395951108,
      "text" : "#TOA15 #Berlin @Zalando #Zuperpower Party and I found this about #Java ;-) @0rb1t3r http://t.co/1vV0H0qXbB",
      "place" : {
        "country_code" : "DE",
        "country" : "Deutschland",
        "contained_within" : [ ],
        "full_name" : "Berlin, Germany",
        "bounding_box" : {
          "coordinates" : [ [ [ 13.088304, 52.338079 ], [ 13.760909, 52.338079 ], [ 13.760909, 52.675323 ], [ 13.088304, 52.675323 ] ] ],
          "type" : "Polygon"
        },
        "place_type" : "city",
        "name" : "Berlin",
        "attributes" : { },
        "id" : "3078869807f9dd36",
        "url" : "https://api.twitter.com/1.1/geo/id/3078869807f9dd36.json"
      },
      "contributors" : null,
      "lang" : "en",
      "favorited" : false
    }
  }
}
Orbiter commented 8 years ago

This has been implemented with the user.json servlet, see http://loklak.org/api/user.json?screen_name=loklak_app