snarfed / bridgy

📣 Connects your web site to social media. Likes, retweets, mentions, cross-posting, and more...
https://brid.gy
Creative Commons Zero v1.0 Universal
727 stars 52 forks source link

run PPD on all profile URLs, not just the first #323

Closed snarfed closed 9 years ago

snarfed commented 9 years ago

...since some people (e.g. @hugoroy) have multiple web sites that they tweet (post, etc) about from a single silo account, and we should be able to backfeed to all of those sites.

background in #322.

snarfed commented 9 years ago

confirmed that we don't run PPD on sites in the webmention blacklist (test).

snarfed commented 9 years ago

investigating the impact of this. here's a remote_api snippet that prints out each source along with its number of profile URLs:

import handlers
for cls in handlers.SOURCES.values():
  for e in cls.query():
    print '%d: https://www.brid.gy%s' % (len(e.domain_urls), e.bridgy_path())
snarfed commented 9 years ago

aaaand the winners are!!!...

41: https://www.brid.gy/googleplus/114356857651526199159 28: https://www.brid.gy/googleplus/114342998833897604241 27: https://www.brid.gy/googleplus/115670007949499078391 24: https://www.brid.gy/googleplus/114047386325191108526 19: https://www.brid.gy/googleplus/101515257858840668532 17: https://www.brid.gy/googleplus/117448569122980078197 13: https://www.brid.gy/googleplus/105228950085229557890 13: https://www.brid.gy/googleplus/105155891492205942420 12: https://www.brid.gy/googleplus/106627512405543005122 10: https://www.brid.gy/googleplus/114543495892430051576

all G+. go figure.

snarfed commented 9 years ago

here are the top-level domains from those 10 accounts' profile URLs:

import itertools, util
from googleplus import GooglePlusPage
print '\n'.join(sorted(set(itertools.chain(
  *[['.'.join(util.domain_from_link(u).split('.')[-2:])
     for u in GooglePlusPage.get_by_id(id).domain_urls]
    for id in ('114356857651526199159', '114342998833897604241', '115670007949499078391', '114047386325191108526', '101515257858840668532', '117448569122980078197', '105228950085229557890', '105155891492205942420', '106627512405543005122', '114543495892430051576')]))))

1frage.de 321blog.de 97bottles.com absolonkent.com absolonkent.net acrobat.com adactio.com aorcsik.com app.net appbrain.com audioscrobbler.com backtype.com bagcheck.com battlefield.com behance.net bensaude.org blogger.com blogspot.com brightkite.com bulletproofajax.com business-pool-bodensee.de carpe.com chip.de claimid.com connect.me coopey.net cyberabad.de dandyid.org dauerwerbeblog.de delicious.com deviantart.com diasp.org digg.com disqus.com domscripting.com dopplr.com elkosmas.gr ffffound.com findings.com flattr.com flickr.com foursquare.com friendfeed.com getsatisfaction.com github.com gitorious.org gmail.com gnolia.com godudu.com goodreads.com grabeuh.com huffduffer.com hyves.nl ichbinbw.de icio.us identi.ca infocloudsolutions.com intensedebate.com intiweb.hu ipersonic.de jaiku.com jelly.gr joindiaspora.com kmworld.com koponyeg.hu lanyrd.com lijit.com linuxinside.gr liotier.org literaturwelt.de live.com medium.com meetin.gs mozilla-greece.org mozipremierek.hu myopenid.com ogok.de oliver-gassner.de openstreetmap.org outlinewebdesign.com pandora.com personalinfocloud.com photobucket.com photoshop.com pinboard.in plaxo.com plazes.com podspot.de prezi.com principiagastronomica.com quora.com qype.com raptr.com readernaut.com reddit.com ruwenzori.net sadactio.com saltercane.com secondlife.com seesmic.com shutterfly.com simoncoopey.net slideshare.net snookerblog.de so.cl socialmedian.com soundcloud.com speakerdeck.com stackexchange.com steamcommunity.com stumbleupon.com technorati.com tent.is thesession.org tribe.net tumblr.com tuuli.info typepad.com ustream.tv vanderwal.net viadeo.com viddler.com vimeo.com vodspot.tv wikipedia.org wordpress.com xfire.com xing.com yahoo.com yatil.de yatil.net ycombinator.com yelp.com yelp.de