Closed nicholaschiang closed 2 years ago
Perhaps this has something to do with an invalid assumption I think I made when designing the articles query: Each tweet may have more than one link and thus perhaps the distinct() filter is messing up.
I have to debug my current articles view PostgreSQL query to:
Before I'll be able to fix TS-17 fully (i.e. and format all of the included links).
/*
This query omits (Tesla, 2758) but when you filter for author_name = 'Tesla' or a specific link_id = 2758 it is included.
Perhaps this has something to do with Beekeper Studio limitations? Or PostgreSQL inaccuracies with select distinct()?
*/
select distinct on (tweets.author_id, urls.link_id)
urls.link_id as link_id,
tweets.*
from urls
inner join (
select
influencers.name as author_name,
influencers.username as author_username,
tweets.*,
scores.cluster_id as cluster_id,
scores.insider_score as insider_score,
scores.attention_score as attention_score,
to_json(influencers.*) as author,
to_json(scores.*) as score,
json_agg(mentions.*) as mentions,
json_agg(tags.*) as tags
from tweets
inner join influencers on influencers.id = tweets.author_id
inner join scores on scores.influencer_id = influencers.id
left outer join mentions on mentions.tweet_id = tweets.id
left outer join tags on tags.tweet_id = tweets.id
group by tweets.id, scores.id, influencers.id
) as tweets on tweets.id = urls.tweet_id
Aha! I found one bug in my PostgreSQL logic: I'm assuming that there's only a single score for each influencer, but that's not the case (e.g. both Elon Musk and Tesla have two scores each).
Here's the problem: because certain influencers (e.g. Elon Musk and Tesla) have scores for multiple clusters.id, whenever the select distinct happens to use the row that has a different clusters.slug than the one that we're querying for (e.g. if the @elonmusk row uses his attention_score for DogeCoin but we're querying for articles about Tesla, all of Elon's tweets will be omitted from the results).
The fix: select distinct on (urls.link_id, tweets.author_id, tweets.cluster_id so that I get all of the rows that have unique links, authors, and clusters. Then, the remaining of the query filters for a specific cluster so no duplicate tweets (from the same author for the same link but with a different cluster's score) should show up.
Imported from @nicholaschiang's original Linear issue TS-38.
The local development server seems to change tweets data on page reload… why is that? Is this some limitation to PostgreSQL queries that I'm unaware of? Is my
articles
viewdistinct
clause not ordered well?