So this is a weird one. Here's the original query:
// original IC-3
profile
MATCH (person:Person {id:$personId})-[:KNOWS*1..2]-(friend:Person)<-[:HAS_CREATOR]-(messageX),
(messageX)-[:IS_LOCATED_IN]->(countryX:Place)
WHERE
not(person=friend)
AND not((friend)-[:IS_LOCATED_IN]->()-[:IS_PART_OF]->(countryX))
AND countryX.name=$countryXName AND messageX.creationDate>=$startDate
AND messageX.creationDate < ($startDate + $duration)
WITH friend, count(DISTINCT messageX) AS xCount
MATCH (friend)<-[:HAS_CREATOR]-(messageY:Message)-[:IS_LOCATED_IN]->(countryY:Place)
WHERE
countryY.name=$countryYName
AND not((friend)-[:IS_LOCATED_IN]->()-[:IS_PART_OF]->(countryY))
AND messageY.creationDate>=$startDate
AND messageY.creationDate<($startDate + $duration)
WITH
friend.id AS personId,
friend.firstName AS personFirstName,
friend.lastName AS personLastName,
xCount,
count(DISTINCT messageY) AS yCount
RETURN
personId,
personFirstName,
personLastName,
xCount,
yCount,
xCount + yCount AS count
ORDER BY count DESC, toInteger(personId) ASC
LIMIT 20
The context is, given a person id, two country names, a start date and a duration, expand out from that single user to FRIENDS..2 that are not the same as the originating person (they can't be because we don't have reciprocal relationships, that check can be dropped), where the friends are not located in a place that is a part of either of the two countries.
Expand out from those friends to messages that they've created that are located in either of the two countries and that meet the time range criteria for the message createdDate. Per friend, aggregate the counts of messages for each of the two countries (we must have non-zero counts of messages per country) and sum those counts. With the top 20 ordered by the total count and then the friend's id, project out the friend's id, first name, last name, each of the counts, and the combined count.
The traditional approach here is expensive. With the parameters given, the person has about 8k distinct friends, who have collectively created near 6 million messages, and the expand from friends to messages, and from messages to country they are located in takes about 18 million db hits, then we have to filter the remaining by date.
On my laptop this takes about 1 minute or so.
I've got an improved query that takes around 5 seconds or so, but it does so in a completely backwards way that wouldn't make sense for any sane version of this dataset.
We start with gathering all the FRIENDS..2 into a list, then expand down from the two countries to their messages and their creators, keeping only those whose creators are in that friends list. Then we filter the messages by their created time, then count() messages per friend and country, do some CASE magic to get the counts into separate columns, sum the counts, filter out any with counts of 0, then get total count and do our ordering and limit.
While this works well with the given countries of Scotland and Laos, I don't think this will hold up for the larger countries. Also if we start with users with a less dense friend network (and/or friends who create far fewer messages) then the original approach would probably beat this out.
I don't have strong confidence of this as a general replacement, but I'll leave that decision to you, as in this case it's clearly a win.
// Inverted IC-3
PROFILE
MATCH (countryX:Place {name:$countryXName}), (countryY:Place {name:$countryYName})
MATCH (person:Person {id:$personId})-[:KNOWS*1..2]-(friend)
WITH DISTINCT countryX, countryY, friend
WHERE NOT (friend)-[:IS_LOCATED_IN]->()-[:IS_PART_OF]->(countryX)
AND NOT (friend)-[:IS_LOCATED_IN]->()-[:IS_PART_OF]->(countryY)
WITH countryX, countryY, collect(DISTINCT friend) as friends
MATCH (friend)<-[:HAS_CREATOR]-(message)-[:IS_LOCATED_IN]->(country:Place)
WHERE country.name in [$countryXName, $countryYName]
AND friend in friends
WITH countryX, countryY, friend, message, country, 1 as ignored
WHERE $startDate <= message.creationDate < ($startDate + $duration)
WITH countryX, countryY, country, friend, count(message) as count
WITH friend, CASE WHEN country = countryX THEN count ELSE 0 END as countX, CASE WHEN country = countryY THEN count ELSE 0 END as countY
WITH friend, sum(countX) as xCount, sum(countY) as yCount
WHERE xCount <> 0 AND yCount <> 0
WITH friend, xCount, yCount, xCount + yCount as count
ORDER BY count DESC, toInteger(friend.id) ASC
LIMIT 20
RETURN friend.id AS personId,
friend.firstName AS personFirstName,
friend.lastName AS personLastName,
xCount,
yCount,
count
So this is a weird one. Here's the original query:
The context is, given a person id, two country names, a start date and a duration, expand out from that single user to FRIENDS..2 that are not the same as the originating person (they can't be because we don't have reciprocal relationships, that check can be dropped), where the friends are not located in a place that is a part of either of the two countries.
Expand out from those friends to messages that they've created that are located in either of the two countries and that meet the time range criteria for the message createdDate. Per friend, aggregate the counts of messages for each of the two countries (we must have non-zero counts of messages per country) and sum those counts. With the top 20 ordered by the total count and then the friend's id, project out the friend's id, first name, last name, each of the counts, and the combined count.
The traditional approach here is expensive. With the parameters given, the person has about 8k distinct friends, who have collectively created near 6 million messages, and the expand from friends to messages, and from messages to country they are located in takes about 18 million db hits, then we have to filter the remaining by date.
On my laptop this takes about 1 minute or so.
I've got an improved query that takes around 5 seconds or so, but it does so in a completely backwards way that wouldn't make sense for any sane version of this dataset.
We start with gathering all the FRIENDS..2 into a list, then expand down from the two countries to their messages and their creators, keeping only those whose creators are in that friends list. Then we filter the messages by their created time, then count() messages per friend and country, do some CASE magic to get the counts into separate columns, sum the counts, filter out any with counts of 0, then get total count and do our ordering and limit.
While this works well with the given countries of Scotland and Laos, I don't think this will hold up for the larger countries. Also if we start with users with a less dense friend network (and/or friends who create far fewer messages) then the original approach would probably beat this out.
I don't have strong confidence of this as a general replacement, but I'll leave that decision to you, as in this case it's clearly a win.