scripting / a8c-FeedLand-Support

A public repo for discussing FeedLand at A8C.
1 stars 0 forks source link

Debugging the river cache #15

Open scripting opened 11 months ago

scripting commented 11 months ago

@fmfernandes -- new versions of feedland and feedlanddatabase, versions 0.5.98 and 0.6.26 respectively.

I added console.log's in several places:

  1. When a river is added to the cache.
  2. When a river is deleted because one of the feeds it contained updated.
  3. When a river is deleted because it aged-out.

This will give us an idea of where we might look next for problems.

I have an idea of a way to get more info out of the system, by adding an API call that returns a structure representing what's in the cache at any moment.

But let's see what we get from this.

scripting commented 11 months ago

Also I updated the feedland repo, it had been a while. You can see where the debugging calls have been added for river caching.

The rivercache-specific code starts here.

fmfernandes commented 11 months ago

Updated our instance with new versions

fmfernandes commented 11 months ago

Hey @scripting, here's a snippet of the applications logs:

09:01:36 PM POST a8c.feedland.org:80 /publishfile https://a8c.feedland.org/?river=true&screenname=dave ip6-localhost
09:01:36 PM - publishStaticFileInSql: screenname == fernando.fernandes@automattic.com, relpath == appPrefs.json
09:01:36 PM - getStaticFileInSql: screenname == fernando.fernandes@automattic.com, relpath == appPrefs.json
09:01:39 PM - addToRiverCache: cachekey == category:dave/All
09:01:39 PM - getRiverFromCategory: cachekey == category:dave/All, 4.154 secs.
09:01:58 PM - logSqlCalls: 9:01:58 PM, ctsecs == 0.015, options.result.length == 1093.
09:01:58 PM - logSqlCalls: sqltext == select * from subscriptions where listname='dave' and categories like '%,All,%';
09:02:02 PM - addToRiverCache: cachekey == category:dave/All
09:02:02 PM - getRiverFromCategory: cachekey == category:dave/All, 4.06 secs.
09:02:02 PM - clearCachedRivers: deleting cache for the river whose key is category:dave/All
09:02:03 PM - checkOneFeed: feedUrl == https://tommycollison.substack.com/feed
09:02:03 PM - clearCachedRivers: deleting cache for the river whose key is category:dave/All
9:02:07 PM GET a8c.feedland.org:80 /getriverfromcategory https://a8c.feedland.org/?river=true&screenname=dave ip6-localhost
09:02:07 PM -logSqlCalls: 9:02:07 PM, ctsecs == 0.011, options.result.length == 1093.
09:02:07 PM - logSqlCalls: sqltext == select * from subscriptions where listname='dave' and categories like '%,All,%';
09:02:11 PM - addToRiverCache: cachekey == category:dave/All
09:02:11 PM - getRiverFromCategory: cachekey == category:dave/All, 4.017 secs.
09:02:18 PM - checkOneFeed: feedUrl == https://www.redreporter.com/rss/index.xml
09:02:19 PM - notifySocketSubscribersFromSql: idLastNewItem == 268280
09:02:19 PM - clearCachedRivers: deleting cache for the river whose key is category:dave/All

Note that you can see that live if you check the VIP dashboard. So by having a quick look, indeed something seems to be triggered the cache clear. Shouldn't that be cleared only when whenCreated > ctSecsLifeRiverCache? It's currently 300 secs and by checking the timestamp for each log message this is happening way before that timeframe.

scripting commented 11 months ago

Timing out is one of two reasons a river is removed from the cache.

The other reason is that a URL that's in the river has updated. I can explain what that means if it's not clear.

And because the call was from clearCachedRivers and not clearOldCachedRivers we can infer this was because a feed updated, so let's add another console.log that tells us what URL of the feed that it thinks updated.

It makes sense the problem would be in this area, possibly, because I just added a new mechanism for that to be called.

scripting commented 11 months ago

I'm sorry I removed the console.log that spews out the ID of the item that caused the update. I'll take a look at that too.

scripting commented 11 months ago

New versions of feedland, feedlanddatabase 0.5.99 and 0.6.27.

scripting commented 11 months ago

One change -- clearCachedRivers now also reports the feedUrl that caused the river to be removed from the cache.

scripting commented 11 months ago

@fmfernandes -- are we going to pick this up tomorrow or should i hang around? i'm happy to go watch the news and have a beer. ;-)

scripting commented 11 months ago

no matter what i'll get the emails..

fmfernandes commented 11 months ago

Hey @scripting, I'll be around for another hour or so. I've just updated our instances again! Deploying now.

scripting commented 11 months ago

Ok, I’ll watch for more data.

fmfernandes commented 11 months ago

New logs:

10:19:11 PM GET a8c.feedland.org:80 /sendprefs https://a8c.feedland.org/?river=true&screenname=dave ::ffff:127.0.0.1
10:19:11 PM - 10:19:11 PM GET a8c.feedland.org:80 /getriverfromcategory https://a8c.feedland.org/?river=true&screenname=dave ip6-localhost
10:19:11 PM - getRiverFromCategory (serving from cache): cachekey == category:dave/All
10:19:14 PM GET a8c.feedland.org:80 /getriverfromcategory https://a8c.feedland.org/?river=true&screenname=dave ::ffff:127.0.0.1
10:19:14 PM - logSqlCalls: 10:19:14 PM, ctsecs == 0.014, options.result.length == 1093.
10:19:14 PM - logSqlCalls: sqltext == select * from subscriptions where listname='dave' and categories like '%,All,%';
10:19:15 PM POST a8c.feedland.org:80 /publishfile https://a8c.feedland.org/?river=true&screenname=dave ::ffff:127.0.0.1
10:19:15 PM - publishStaticFileInSql: screenname == fernando.fernandes@automattic.com, relpath == appPrefs.json
10:19:15 PM - getStaticFileInSql: screenname == fernando.fernandes@automattic.com, relpath == appPrefs.json
10:19:18 PM - addToRiverCache: cachekey == category:dave/All
10:19:18 PM - getRiverFromCategory: cachekey == category:dave/All, 4.387 secs.
10:19:19 PM - 10:19:19 PM GET a8c.feedland.org:80 /getfeed https://a8c.feedland.org/?river=true&screenname=dave ::ffff:127.0.0.1
10:19:19 PM GET a8c.feedland.org:80 /getfeed https://a8c.feedland.org/?river=true&screenname=dave ip6-localhost
10:19:23 PM - checkOneFeed: feedUrl == https://inphotos.org/feed
10:19:25 PM - checkOneFeed: feedUrl == https://elizasj.substack.com/feed
10:19:38 PM - checkOneFeed: feedUrl == http://www.nytimes.com/svc/collections/v1/publish/www.nytimes.com/column/frank-bruni/rss.xml
10:19:40 PM - checkOneFeed: feedUrl == https://courtneyscott.tv/feed
10:19:53 PM - checkOneFeed: feedUrl == http://feeds.feedburner.com/PlacesJournal
10:19:55 PM - checkOneFeed: feedUrl == http://feeds.feedburner.com/AndroidPolice
10:19:55 PM - clearCachedRivers: deleting cache for the river whose key is category:dave/All. feedUrl == http://feeds.feedburner.com/AndroidPolice
10:19:55 PM - Samsung could debut a game streaming service exclusively for Galaxy devices
10:19:55 PM - Your phone will sound an alarm on Wednesday, but there’s no need to panic
10:19:55 PM - WhatsApp is picking up some advanced text formatting options
10:19:55 PM - Best OnePlus 10T screen protectors in 2023
10:19:55 PM - Create your own maps with the latest Google Earth update
10:19:55 PM - Upgrade your car to wireless Android Auto for less with this discounted adapter
10:19:57 PM - notifySocketSubscribersFromSql: idLastNewItem == 268493
10:19:58 PM - notifySocketSubscribersFromSql: idLastNewItem == 268493
10:19:58 PM - clearCachedRivers: deleting cache for the river whose key is category:dave/All. feedUrl == http://feeds.feedburner.com/AndroidPolice

Looks like it was triggered by a feed update indeed

scripting commented 11 months ago

Thanks for the report. I’ll think about next steps.

scripting commented 11 months ago

There is information here. I will have to look at more of the log. Both times it was Android Police. It must be rssCloud pinging us because we wouldn't be checking it that often.

Maybe I'll try unsubbing from Android Police and see if that gets rid of this issue.

scripting commented 11 months ago

Guess what, when I unsubbed from Android Police the problem went away.

Try reloading this page over and over. It's obviously loading from the cache.

https://a8c.feedland.org/?river=true&screenname=davewiner

scripting commented 11 months ago

Something else i want to look in in the morning.

clearCachedRivers is called from a number of places it probably doesn't need to call from, because now we're doing SQL-based change detection (the same flow that goes to the clients is also going through clearCachedRivers).

At times I wonder if we need to keep the non-chronologic river cache clearing, but we do need it. here's the use-case..

Suppose you just unsub from a feed that dumped a shitload of garbage into your river, with the intention of getting it out of your way. So you go unsub, and because we haven't thrown out the cache, when you look at the river the garbage is still there! It looks like a bug because it is. When you unsub it's immediate.

Less of a problem for subscribing, though we need to clear the river cache for that too.

It's so weird that this system behaves so differently from feedland.org. I would have thought we were already past that.

See you tomorrow! :-)

scripting commented 11 months ago

Good morning. I'm feeling pretty sick today, it's a problem i've been seeing the doctor about, and this morning it was pretty bad, so I had to take care of my health first. But i'm feeling better now, and i'm going to try to implement the fix i described in the previous comment.

I'll post notes here as it progresses.

scripting commented 11 months ago

Okay so if config.flUseSqlForSockets is true, that means we're only sending updates to the users from the SQL method of locating changes. In fact we're still sending updates from there and from the scanning we do and responses to rssCloud calls.

At first I thought -- who cares if you get notified twice of an update, which will happen sometimes, how often it happens depends on how many instances are running.

And it does matter, as we're finding here. Maybe. We still don't understand why the updates are coming for the feeds they're coming for.

But I'm going to straighten this out now and then we'll see where we're at.

scripting commented 11 months ago

New versions of feedland and feedlanddatabase versions 0.5.100 and 0.6.28.

scripting commented 11 months ago

I've brought the feedland repo up to date if you want to review the very small number of changes, and i'm happy to explain.

https://github.com/scripting/feedland/blob/main/worknotes.md

https://github.com/scripting/feedland/blob/main/database/worknotes.md

maybe jump on a quick voice chat to talk this over?

scripting commented 11 months ago

Just sent a voicemail explaining, over on the slack channel. Thought I should include it here too make it easier to find in the future. (But it turns out they don't support MP3.)

fmfernandes commented 11 months ago

That's fine! A link to the message will work. Did you test checking your river's All tab?

scripting commented 11 months ago

Yes, the performance is perfect. But what i want to know is, when you read the logs, do you see the cache getting tossed when it makes no sense??

scripting commented 11 months ago

I had a thought for optimization later -- like a year from now..

We can send a message to the client that we tossed the river from the cache, to which it can send the message back "hey the user has this river on their screen, so why don't you recalc it and put it back in the cache."

If it isn't on their screen, the client ignores the message.

That way even if the cache got tossed, the user never sees the delay.

fmfernandes commented 11 months ago

But what i want to know is, when you read the logs, do you see the cache getting tossed when it makes no sense??

I've been monitoring it since the update and everything seems fine.

scripting commented 11 months ago

Excellent. I think we're starting to define our roles.

You are responsible for the server, I'm responsible for the software.

BTW, I'm working on the FeedLand docs. I'll need people to carefully read them, look for problems. Report issues. We treat the docs like software.

scripting commented 11 months ago

I think this is going to be an open issue for some time. Thinking about new strategies to improve performance.

Here's an anecdote. I still have my iPad configured to use feedland.org. I am depressed that there usually is no delay loading the All tab there. I have a theory that the reason is that it is also part of news.scripting.com and it's getting enough traffic now that someone else is taking the hit. The probability that I am the first one to read that tab after the cache is cleared is much smaller than it is on A8C.feedland.org.

I'm still getting a lot of 4-second loads on the All tab on A8C.

This is leading me to think of different approaches here.

  1. Maybe -- turn off all url-based cache clears and only do it on time, and increase the time-to-live to something like 10 or 15 minutes.

  2. Maybe -- optimize it for paying customers by regenerating the page immediately after clearing the cache, so that if the user goes there it's more likely the page will already be loaded. Basically emulating the anecdote above.

  3. If we did either of the previous, there would have to be an exception for unsubs. When you unsub you want that to be seen immediately with or without a delay. So that message has to get through. Easy enough technically, but when reading the code, it'll require a lot of explanation with links to comments like this one.

Just some thoughts...