scripting / feedBase

A project to get feeds into a base.
MIT License
33 stars 3 forks source link

added recommended feeds service endpoint - Issue #42 #44

Closed vincode-io closed 2 years ago

vincode-io commented 5 years ago

This adds the service for recommendations, but doesn't add any UI for it as that code isn't in this repository.

scripting commented 5 years ago
  1. Thanks for doing this. It's far beyond my SQL ability. When it works it'll be great.

  2. I am integrating the functionality carefully, and have hit a problem with the SQL that's being generated. I opened an interactive connection with the database, and entered the generated SQL text. The process hung without returning; after a few minutes, I killed it.

Here's the code I ran.

select distinct f1.countSubs, f1.title, f1.feedUrl, f1.htmlUrl from subscriptions s1, subscriptions s2, subscriptions s3, feeds f1 where s1.feedUrl = s2.feedUrl and s1.username = 'davewiner' and s1.username != s2.username and s2.username = s3.username and s3.feedUrl = f1.feedUrl and s3.feedUrl not in (select feedUrl from subscriptions where username = 'davewiner') order by f1.countSubs DESC, f1.title limit 20;

This is the exact text that your code generated running inside feedBase, assuming I copied it correctly (always a possibility that I didn't).

Let's iterate here and get it working. ;-)

Dave

vincode-io commented 5 years ago

You got the SQL right. Do we need to add some indexes to the database?

We should have an index on the feedUrl column of the feeds table. We also need indexes on the feedUrl and username columns in the subscriptions table.

scripting commented 5 years ago

I don’t know, you’re the sql expert. Can you tell me how to create the indexes.

On Saturday, March 30, 2019, Maurice Parker notifications@github.com wrote:

You got the SQL right. Do we need to add some indexes to the database?

We should have an index on the feedUrl column of the feeds table. We also need indexes on the feedUrl and username columns in the subscriptions table.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scripting/feedBase/pull/44#issuecomment-478264707, or mute the thread https://github.com/notifications/unsubscribe-auth/ABm9Oya0UTEktX2fJL4LO8iOx3wwkhszks5vb5g-gaJpZM4cSwnX .

-- Typed on an iPad with fat fingers.

vincode-io commented 5 years ago

Given that the tables have sensible primary keys, there are some good indexes in place already. One we might want to add is for the subscriptions.username column since we do a lot with that one in the query.

Run this command. It might take a little time to complete. I don't know how long it will take as I don't have any particulars on how big the production database is or what kind of hardware resources it has available. After that rerun the query and see if performance has improved.

create index subscriptions_username_idx on subscriptions(username) using HASH;

If that doesn't do it, we will try adding a hashed index to the subscriptions.feedUrl column.

scripting commented 5 years ago

The index command ran instantaneously.

The query takes 5.8 seconds.

If you want to see the result:

  1. Go to http://feedbase.io/

  2. Open the JavaScript console.

  3. Enter viewRecommendations () and press Return.

You should see a JSON struct with your recommendations, and it should report how much time it took.

vincode-io commented 5 years ago

I'm seeing under 2 seconds when I use viewRecommendations() consistently. My profile has about 1/2 of the subscriptions that yours does, so that is probably most of the difference. 2 seconds for a complete round drip to the database isn't horrible for this kind of query, but I wish it was faster.

I tried adding an index to the subscriptions.feedUrl column to my local database and there was no difference in performance, so I don't think we should add it.

I did an EXPLAIN on my local database and it looked pretty good. (EXPLAIN tell us how the tables are being joined and which indexes are being used.) There isn't anything more I can do to optimize the query or table indexing.

I don't think we are going to get much better performance with the current database structure. One of the things that might be holding us back is the fact that we are joining most of the time using a very long column, feedUrl. Basically, the more bytes we're working with, the more latency we'll have. This can be changed by moving using an integer as the primary key of feeds.

Changing how primary keys work on tables already in production is a little bit of work. If you feel strongly that performance isn't acceptable, I can work up a plan to get it done.

vincode-io commented 5 years ago

Here is an article about using longer character columns as keys on mysql: https://www.percona.com/blog/2007/06/18/using-char-keys-for-joins-how-much-is-the-overhead/

Basically the author is advocating for using int keys over char keys. I agree with this sentiment, but it would mean doing quite a bit of work on the database and feedbase.js.

One interesting bit in this article is where he talks about the performance impact of MyISAM when joining using VARCHAR keys. Apparently it does compression by default. If we are using a mysql database with MyISAM tables, we might be able to set the compression off and improve performance.

Dave, could you run the following command so that we can see what the engine is for the feeds table? It should come back as either MyISAM or InnoDB.

SHOW TABLE STATUS WHERE Name = 'feeds'