Closed snarfed closed 8 years ago
cc @kylewm in case you're interested in adding flickr search support... (see above)
the remaining part here is to send mention posts themselves, not just their responses. this needs a new post
response type connected to the post mf2 handler.
finally soft launched this, and it worked well, but evidently has a memory leak, so i had to roll it back.
Exceeded soft private memory limit of 256 MB with 328 MB after servicing 2 requests total.
ugh.
there's FUD here and there about the sockets API maybe causing memory leaks due to badly handled range requests, but i can't tell how real it is or if it could be causing this. i suspect i've just been wasteful with memory, e.g. lots of string concatenations and copy.deepcopy
s, and it's finally time to pay the piper. whee, can't wait to heap profile. :sob:
silver lining: at least i know the window of commits where the leak was introduced!
the little orange bump of 500s here is our instances flapping (OOMing, restarting, and OOMing again):
here's a snippet of individual requests at peak flap. the red !!! ones are OOMs. not pretty!
silver lining: it's working ok, at least! e.g. the top response here: https://www.brid.gy/twitter/kylewmahan#responses is this tweet: https://twitter.com/anarcho/status/643921641664200704 which propagated as a mention to https://kylewm.com/2015/09/repost-of-glenn-greenwald-the-new-revolving-door
wow, that mention is hidden behind a redirect too, pretty cool!
for the record: who's the dunce who sprinkled copy.deepcopy
s throughout poll, basically bridgy's inner loop, and then acted all surprised when it blew our memory budget? this guy!!! :P
ok, i think it might stick this time. monitoring graphs below. i turned it on for just @kylewm and me at the 1hr (ago) point, for 6 more accounts at 45m, and for everyone at 30m. ran out of memory once, largely due to polling @kevinmarks a few times in rapid succession (he's prolific), but that's it. and we hit that cap occasionally anyway, so i'm not too worried.
i love it. it actually collects all tweets containing links to my articles. looks great, too.
thanks a lot for this, it’s a great new feature!
thanks for the kind words!
this has noticeably increased our poll latency:
the poll task queue is now ~90m behind. not a big deal, but definitely not ideal. hrmph. time to profile i guess.
some of this might be just because our slow poll frequency is once a day, so we're still working through the first set of search results for many users. that should be done by around noon PST. i'll revisit if latency is still consistently bad after that.
scratch that, we'll be caught up by ~1:30pm PST today, since we're ~90m behind. math!
poll latency is looking better now. averaging 5-10s, higher than ~4s before, but still reasonable.
the poll queue is still behind by 45m :/, but i'm hoping some of that was due to #490. i pushed out a change there (1ebfe1cf0d3c6675d9f8291434dc50e3fba2c39a) a few hours ago that adds a bunch of shortlink generator domains to the blacklist and checks the blacklist before searching for a domain, so i'm hoping that will help some too.
tentatively closing. this has been running in prod and stable for a few days. I'm sure there are more bugs left to fix, but we can open new issues for them.
Does brid.gy also turn @ mentions to my twitter username to webmentions to my domain? That would be similar to this and very nice
@singpolyma not right now, but that's an interesting feature request. just to confirm, you're proposing they'd be sent to your front page, e.g. target=https://singpolyma.net/
?
@snarfed yes. or whatever URL is on my twitter profile
i currently craft search queries by stripping scheme (ie http://), putting quotes around the remaining domain and path, and ORing all of those together, e.g. "snarfed.org" OR "instagram.com/snarfed"
. sadly, this has been returning both false positive and false negatives in both G+ and Twitter. :/
i added the scheme back to G+ searches in 485af7323352ef9840c962c090dd7598fe9f8d53, and it looks like that cut out the false positives but didn't add any false negatives.
still working on Twitter. here's some research so far for the example domain hypothes.is
, including links to searches:
hypothes.is
returns similar
usernames and word variations, e.g. _@hypothesis and hypothesis is"hypothes.is"
(our current approach) is better, but still returns _@hypothesishttps://hypothes.is
and https://hypothes.is/
(trailing slash) only return links to the home page. same with "https://hypothes.is"
and "https://hypothes.is/"
hrmph.
i'm now thinking about still using the "hypothes.is"
style search for twitter and filtering out the false positives manually.
Filtering false positives seems like an essential thing to do. Trying to get as much as possible is probably the best, then filter after
i wish! sadly many users' domains are common words, or have common words in them, so their false positive rate can be 1K:1 or even 1M:1 for domains with words like blog or web. :/ and bridgy is approaching 1k twitter users, so I'd like to try to cut down that workload (and cost) a bit.
filter out common words and only search for the unique part maybe?
oh boy, and now i'm in the business of maintaining a stop word list and search query rewriter. :P you're definitely right, it's doable, i'm just not sure i want to take that plunge...
Sorry. Was a thought
np! definitely appreciated. :two_men_holding_hands:
spun out of #51. from https://github.com/snarfed/bridgy/issues/51#issuecomment-135816838:
silo support for this is mixed:
/search/tweets.json?q=
/activities?query=