Open nstjean opened 4 years ago
I suspect that we are going to have to use less search params in order to get the speeds we want. The problem is I'm not sure how we can test those params without pushing and merging the changes to the live site, and that's going to be a really slow iteration. Anyone want to look into this?
@nstjean i could like to try this one. I think the issue could be because of the greater number of parameters to be processed. Due to which, it takes a larger amount of time to load and process the data.
@dburugupalli Go for it!
Yes we need to decrease the time taken by that query. Thanks!
@dburugupalli my first suggestion is to try adding a limit to this. we should try to trace the query -- adding more info here:
This is the query, which i suspect runs real slow on the whole huge users table. We can limit with an additional:
.where(status: 1)
which will help a lot. We could also join the nodes table and insist that the user has a node which has node.status == 1
too. That'd help a lot too.
Yes, here is the Skylight report on this query:
Full report here: https://oss.skylight.io/app/applications/GZDPChmcfm1Q/1583245980/6h/endpoints/Srch::Search%20%5BGET%5D%20srch%2FnearbyPeople
That actually says the slow query is the user tags query...
So maybe when we run this, it is running on a TON of records:
We could limit it to 100
Also, are we re-running a user_tag query on each instance, on this line?
item.user_tags.none? do |user_tag|
If so, we could optimize this.
I think this'll take a little experimentation but if that is a nested query on item.user_tags.none?
we should be able to speed this up a LOT by a) limiting the results and b) not nesting the query, but, inspecting the item and seeing what we can do to test if it has any tags without generating a new nested query.
Looking at whether we need to add an index, or reduce recursion in query nesting...
Hmm. user_tags
has no indices: https://github.com/publiclab/plots2/blob/15589c95b54759b2828ea0d9b3571d2bebf8c2dc/db/schema.rb.example#L417-L422
Not sure if it needs them? Checking # of records now.
Primary key ought to auto-index...
So, i ran it in the production console and it's really fast: UserTag.find_by(uid: 1)
So, i think it's the nesting - i bet we are running it on a ton of users.
https://github.com/publiclab/plots2/blob/783fbe2161bd70e13ad5e62ac6ab9254de3661b5/app/services/search_service.rb#L197
and
https://github.com/publiclab/plots2/blob/783fbe2161bd70e13ad5e62ac6ab9254de3661b5/app/services/search_service.rb#L176
have
User.where('rusers.status <> 0') .joins(:user_tags)
same statements running twice! Can we optimize here by storing them ?
shouldn't there be status<>0
clause at user in https://github.com/publiclab/plots2/blob/783fbe2161bd70e13ad5e62ac6ab9254de3661b5/app/services/search_service.rb#L187 ?
I think there should be !
Two possible ways to reduce time are short circuiting if possible, AND rearrangement of statements so that the first clause eliminates more cases then the upcoming clauses in sql statements.
I think the first and second referenced in your comment are the latitude and longitude calcs I want to combine.
https://blog.nrowegt.com/rails-pg-ambiguous-column-reference/
OK, i was able to get this to work:
u = User.where('rusers.status <> 0').joins(:user_tags).where('user_tags.value LIKE ?', 'lat%').where('REPLACE(user_tags.value, "lat:", "") BETWEEN 44 AND 44.1').joins('INNER JOIN `user_tags` AS `lontags` ON lontags.uid = rusers.id').where('lontags.value LIKE ?', 'lon%').where('REPLACE(lontags.value, "lon:", "") BETWEEN 20 AND 80')
Formatted:
u = User.where('rusers.status <> 0')
.joins(:user_tags)
.where('user_tags.value LIKE ?', 'lat%')
.where('REPLACE(user_tags.value, "lat:", "") BETWEEN 44 AND 44.1')
.joins('INNER JOIN `user_tags` AS `lontags` ON lontags.uid = rusers.id')
.where('lontags.value LIKE ?', 'lon%')
.where('REPLACE(lontags.value, "lon:", "") BETWEEN 20 AND 80')
Hmm. @cesswairimu actually the block I'm still not sure about is:
item.user_tags.none? do |user_tag|
user_tag.name == "location:blurred"
end
What's going on here?
i think i found that the block is used to evaluate all items of the array and the whole thing returns true only if none of the array members return true? https://ruby-doc.org/core-2.7.0/Enumerable.html#method-i-none-3F
OK, i added .limit(limit)
in a couple places too. I hope this helps!!!
https://github.com/publiclab/plots2/issues/7556#issuecomment-610625735
Could the replace statements move towards end? After sorting, we will have lesser number of rows so replacing at last step can improve(I don't have full context, so it may be wrong)
https://github.com/publiclab/plots2/pull/7795 is now merged, although one of the .limit()
statements had to be removed. We should re-test this after today's live site republication and see if performance improved?
Current:
Now not seeing much more than 8s, much better than 60!!!
but needs more testing.
Test it out at this zoom level: https://publiclab.org/map#9/40.71372914097432/-73.47581131383778
cc zoom level optimizations: https://github.com/publiclab/plots2/issues/7822
OK, so i'm not seeing a huge difference between before and after April 14; both are averaging around 17 seconds or so, and both have outliers up to more than a minute of load time, so not good. But even before the 14th, the "what time is spent on" chart is quite different for different periods.
We just don't have a huge amount of traffic so statistics aren't great, but I'll screenshot a few "moments" so we can look at different possible sources of inefficiency:
this has 3 queries:
SELECT DISTINCT `rusers`.* FROM `rusers` INNER JOIN `user_tags` ON `user_tags`.`uid` = `rusers`.`id` WHERE (rusers.status <> ?) AND (value LIKE ?) AND (REPLACE(value, ?, ?) BETWEEN ? AND ?)
SELECT `rusers`.* FROM `rusers` INNER JOIN `user_tags` ON `user_tags`.`uid` = `rusers`.`id` WHERE (rusers.status <> ?) AND (rusers.id IN ?) AND (user_tags.value LIKE ?) AND (REPLACE(user_tags.value, ?, ?) BETWEEN ? AND ?)
SELECT `rusers`.* FROM `rusers` INNER JOIN `user_tags` ON `user_tags`.`uid` = `rusers`.`id` WHERE (rusers.status <> ?) AND (rusers.id IN ?) AND (user_tags.value LIKE ?) AND (REPLACE(user_tags.value, ?, ?) BETWEEN ? AND ?) ORDER BY created_at DESC LIMIT ?
That was all just one query, for the period of time before the April 14th optimization.
This next one was after the optimization:
The query was more simple:
SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = ?
It does seem that most slowness after April 14th was to do with user_tags
for a single user... that's weird.
Could it be that for large sets of users, this is just these lines running over and over?
If true (and we should be able to test that on the console?) we should be able to do this filtering a different way, as we should already have loaded the user tags and be able to additionally sort by them. But maybe we haven't eager-loaded the associations?
Could we a) do this in 2 batches, one for users with a location:blurred
tag and once without, and/or b) eager-load the user tags so we can filter without doing individual queries for each user?
Also, pssst @Tlazypanda the staff and I were talking about this today and I thought you may also want to put eyes on it since it is performance related! :)
@emilyashley Yeah sure! Would need a bit of context on this first though :sweat_smile: Will look into it :smile:
Hi @daemon1024 -- i wondered if this might be one you'd like to take up in your project, if you end up having time. It may have a big impact on usability. Thanks!
Hey @jywarren, Definitely. I actually mentioned benchmarking the People API in my proposal and this gives a lot of relevant context as well.
Hi @daemon1024 - I am looking at this next and I think @Manasa2850 and @17sushmita may have some insights into this as it's a query heavily based on the tagging system.
I'd say we should start with:
The whole query is in this method:
This seems to be an interesting issue 🤔 . Looking into this if I can find anything.
So the problematic queries are still the same
SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = ?
and
SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = ? AND (value LIKE ?)
The second one is quite similar to first one so bound to be problematic :P
The ?
here denotes that we are looping and repeatedly making these queries for different uids.
Skylight suggests that we eager load things and predefine assosciations we want so as the generated query is not repetitive
- SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = ?
+ SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` IN ( some raange here ..... )
Ref : https://www.skylight.io/support/performance-tips#solving-the-n1-query-problem
I believe the problematic part is https://github.com/publiclab/plots2/blob/6832a437ed2a6451d6bc30bb9876906a9ec92eb8/app/services/search_service.rb#L206-L211 Since that's the only part we are looping I think. I am not really sure though.
Here's a guide to how we can eager load things https://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations
I believe we can modify this line to be something like
user_locations = user_locations.includes(:user_tag).where('rusers.id IN (?)', uids).distinct
What do you think about this?
To confirm our hypothesis maybe we can just choose to comment out blurred locations code and see our performance.
Let's try eager loading, push it to unstable, and try it out, timing things in CURL for now? It's a little more primitive but could work:
https://www.badunetworks.com/performance-testing-curl-part-1-basics/
Thank you!!!
I think we'll probably have to try a few different measures to reduce the load time. We could also filter results more if it's raw database size that's causing trouble. But reducing nested queries may help a lot. I like the idea of testing by:
To confirm our hypothesis maybe we can just choose to comment out blurred locations code and see our performance.
We could push such a test to unstable too in order to test it out.
Just be aware (@)icarito is using unstable a lot these past few days to test out an image upload issue. So if you mention his name in the chatroom you can both use unstable by taking turns or just making the other aware so you don't both try to push to it at the same time.
Thank you!!!
I tested thing out on https://unstable.publiclab.org/api/srch/nearbyPeople?nwlat=55.3791104480105&selat=23.40276490540795&nwlng=-128.67187500000003&selng=178.76953125
Well the improvements are 10x already. From 30+seconds to 2-4seconds
Also tried to the query mentioned in the main issue body i.e. nwlat=10.0&selat=0.0&nwlng=0.0&selng=10.0
seeing improvements on a similar level ( near about 10x )
So the next step would be to try eager loading. This should help in our performance significantly I believe.
I think we'll probably have to try a few different measures to reduce the load time.
Agreed. Will look into further improvements after improving this.
The performance now seems to be regressed back to normal. I think it's not the right way to eager load the query. Will have to look into how to do it the correct way. Any inputs appreciated!
So we have a similar construct in nearbyNodes too. https://github.com/publiclab/plots2/blob/b70cbefa443ace9be1e83a5dde56a10e9603a135/app/services/search_service.rb#L144-L148
And that uses eager loading too. https://github.com/publiclab/plots2/blob/b70cbefa443ace9be1e83a5dde56a10e9603a135/app/services/search_service.rb#L137
History for this points to https://github.com/publiclab/plots2/pull/1987/.
Reverted this commit since it broke things 😅
Hey @daemon1024 just noting here https://github.com/flyerhzm/bullet you can set this gem up locally to test whether the n+1 problem is getting resolved or not :v: This really helped me a lot to figure out the exact traceback of the n+1 query issue and change my code accordingly. Once the issue is resolved it will also stop generating the warning for n+1
Hi @daemon1024 can you link us to the PR you're working in? Sorry if I missed that!
@jywarren There's no PR for now 😅. Most of my testing has been in local or directly on unstable. Will make a PR once I have something concrete. Rest noted my observations above.
Thanks @Tlazypanda for the gem, looks very helpful and will get back to you after trying it out 🙌
Ah i see no problem! - still, good for us to be able to see your code so far, so don't be afraid to share even rough early code!
Hi @daemon1024
So the problematic queries are still the same
SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = ?
This one is due to the looping over on every node by the user to check if it has
location: blurred
.
https://github.com/publiclab/plots2/blob/b70cbefa443ace9be1e83a5dde56a10e9603a135/app/services/search_service.rb#L202-L207 So can be resolved by using a simpler rails query
items = items.where('user_tags.value <> "location:blurred"')
Second query still needs to be debugged
and
SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = ? AND (value LIKE ?)
I'm not really sure about the second query but I observed that when the limit is 5 it performs 20 such queries - 4 query for each record and the 4 queries are in following pattern-
UserTag Load (29.2ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1000 AND (value LIKE 'lat:%')
UserTag Load (29.1ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1000 AND (value LIKE 'lon:%')
UserTag Exists (15.0ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 1000 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (39.0ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1000 AND (value LIKE 'location:%')
First one for loading tag like 'lat:%' second for loading tag like 'lon:%' and third one for checking if 'location:%' exists or not and fourth for loading the tag like 'location:%' on the same user.
We are only using LIKE 'lat%' and LIKE 'lon%' queries in this rails query-
https://github.com/publiclab/plots2/blob/b70cbefa443ace9be1e83a5dde56a10e9603a135/app/services/search_service.rb#L176-L183
and checking if location: blurred
exists in this query-
https://github.com/publiclab/plots2/blob/b70cbefa443ace9be1e83a5dde56a10e9603a135/app/services/search_service.rb#L202-L207
which after optimizing will look like-
items = items.where('user_tags.value <> "location:blurred"')
So, I think these queries are automatically performed by rails ORM to verify if returned records have these attributes or not.
Complete trace of queries executed for the following parameters - http://publiclab.org/api/srch/nearbyPeople?nwlat=55.3791104480105&selat=23.40276490540795&nwlng=-128.67187500000003&selng=178.76953125 are (placed dummy user ids in place of original ones) -
UserTag Load (29.2ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1000 AND (value LIKE 'lat:%')
UserTag Load (29.1ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1000 AND (value LIKE 'lon:%')
UserTag Exists (15.0ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 1000 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (39.0ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1000 AND (value LIKE 'location:%')
UserTag Load (42.6ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1001 AND (value LIKE 'lat:%')
UserTag Load (47.8ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1001 AND (value LIKE 'lon:%')
UserTag Exists (14.7ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 1001 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (43.4ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1001 AND (value LIKE 'location:%')
UserTag Load (39.3ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1002 AND (value LIKE 'lat:%')
UserTag Load (43.5ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1002 AND (value LIKE 'lon:%')
UserTag Exists (12.0ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 1002 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (47.6ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1002 AND (value LIKE 'location:%')
UserTag Load (42.4ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1003 AND (value LIKE 'lat:%')
UserTag Load (40.5ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1003 AND (value LIKE 'lon:%')
UserTag Exists (14.2ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 1003 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (41.4ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1003 AND (value LIKE 'location:%')
UserTag Load (38.5ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1004 AND (value LIKE 'lat:%')
UserTag Load (41.5ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1004 AND (value LIKE 'lon:%')
UserTag Exists (13.7ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 1004 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (45.5ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1004 AND (value LIKE 'location:%')
These are some stack overflow questions that I found had some similar issues like us (Not sure)- https://stackoverflow.com/questions/50483971/why-is-ruby-on-rails-executing-select-1-as-one-queries https://stackoverflow.com/questions/53125413/page-loads-unnecessary-queries-ruby-on-rails
Order of execution of things I see in skylight.
The first query:
SELECT DISTINCT `rusers`.* FROM `rusers` INNER JOIN `user_tags` ON `user_tags`.`uid` = `rusers`.`id` INNER JOIN user_tags AS lontags ON lontags.uid = rusers.id WHERE (rusers.status <> ?) AND (user_tags.value LIKE ?) AND (REPLACE(user_tags.value, ?, ?) BETWEEN ? AND ?) AND (lontags.value LIKE ?) AND (REPLACE(lontags.value, ?, ?) BETWEEN ? AND ?)
which I believe translates from https://github.com/publiclab/plots2/blob/b70cbefa443ace9be1e83a5dde56a10e9603a135/app/services/search_service.rb#L176-L183
The second query:
SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = ?
which is already fixed in #9940 🥳
The third query:
SELECT DISTINCT `rusers`.* FROM `rusers` INNER JOIN `user_tags` ON `user_tags`.`uid` = `rusers`.`id` INNER JOIN user_tags AS lontags ON lontags.uid = rusers.id WHERE (rusers.status <> ?) AND (user_tags.value LIKE ?) AND (REPLACE(user_tags.value, ?, ?) BETWEEN ? AND ?) AND (lontags.value LIKE ?) AND (REPLACE(lontags.value, ?, ?) BETWEEN ? AND ?) ORDER BY created_at DESC LIMIT ?
This is where things get interesting. We are kind of repeating our query. with some additional params)(ORDER BY created_at DESC LIMIT ?
). And I believe this is caused by
https://github.com/publiclab/plots2/blob/b70cbefa443ace9be1e83a5dde56a10e9603a135/app/services/search_service.rb#L232-L234
So we need to debug our recency sort code?
The consequent queries
SELECT ? AS one FROM `user_tags` WHERE `user_tags`.`uid` = ? AND (value LIKE ?) LIMIT ?
and
SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = ? AND (value LIKE ?)
( the problematic one )
So, I think these queries are automatically performed by rails ORM to verify if returned records have these attributes or not.
Agreed, rails is probably is running it to verify against something we have triggered. I believe it's the ordering code https://github.com/publiclab/plots2/blob/b70cbefa443ace9be1e83a5dde56a10e9603a135/app/services/search_service.rb#L220-L235
What do you think @17sushmita ?
Also can you provide me a full stack trace? Like fuller than you share :P which includes all the queries triggered by this service ( like select distinct ones )
Thank you.
There were 4 types of queries executed by this service -
Basically the looping part is breaking our whole single rails query into multiple and so adding redundancy because after looping the previous query is executed again with all the additional queries after the looping.
Full stack trace of original queries triggered by this service before fixing(I have placed ? in place of actual user ids and in the last type of queries to maintain the pattern of 4 queries on the same user I have placed dummy user ids in place of original ones)- https://gist.github.com/17sushmita/54fbb31b064d9f643fe58ddc41d616c4 (pretty long so sharing as gist)
After the change in #9940, the second query is fixed and we have a single query for first, second(fixed one) and third so now we have only two types of queries-
This is the full stack trace of queries executed by the service after making the change. (I have placed some dummy user ids in place of actual user ids.)
User Load (3763.5ms) SELECT DISTINCT `rusers`.* FROM `rusers` INNER JOIN `user_tags` ON `user_tags`.`uid` =
`rusers`.`id` INNER JOIN user_tags AS lontags ON lontags.uid = rusers.id WHERE (rusers.status <> 0) AND (user_tags.value
LIKE 'lat%') AND (REPLACE(user_tags.value, "lat:", "") BETWEEN 23.40276490540795 AND 55.3791104480105) AND
(lontags.value LIKE 'lon%') AND (REPLACE(lontags.value, "lon:", "") BETWEEN -128.67187500000003 AND 178.76953125)
AND (user_tags.value <> "location:blurred") ORDER BY created_at DESC LIMIT 5
UserTag Load (260.6ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1000 AND (value LIKE 'lat:%')
UserTag Load (50.4ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1000 AND (value LIKE 'lon:%')
UserTag Exists (26.5ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 1000 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (63.2ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1000 AND (value LIKE 'location:%')
UserTag Load (50.0ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1001 AND (value LIKE 'lat:%')
UserTag Load (55.0ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1001 AND (value LIKE 'lon:%')
UserTag Exists (24.3ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 1001 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (52.8ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1001 AND (value LIKE 'location:%')
UserTag Load (45.2ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1002 AND (value LIKE 'lat:%')
UserTag Load (54.7ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1002 AND (value LIKE 'lon:%')
UserTag Exists (22.6ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 1002 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (61.2ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1002 AND (value LIKE 'location:%')
UserTag Load (53.7ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1003 AND (value LIKE 'lat:%')
UserTag Load (52.1ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1003 AND (value LIKE 'lon:%')
UserTag Exists (22.8ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 1003 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (53.6ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1003 AND (value LIKE 'location:%')
UserTag Load (52.9ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1004 AND (value LIKE 'lat:%')
UserTag Load (57.9ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1004 AND (value LIKE 'lon:%')
UserTag Exists (19.2ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 1004 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (63.5ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 1004 AND (value LIKE 'location:%')
Are you sure about
This is the full stack trace of queries executed by the service after making the change. (I have placed some dummy user ids in place of actual user ids.)
the trace below this being full?
I think there should be a few other relevant queries above it.
Since the trace you are sharing starts with ORDER BY created_at DESC LIMIT 5
and I believe it shouldn't be triggered until the end of the function.
Let me know if I am in the wrong.
Rest, I have understood your point regarding looping.
To confirm my hypothesis, maybe we can comment out https://github.com/publiclab/plots2/blob/b70cbefa443ace9be1e83a5dde56a10e9603a135/app/services/search_service.rb#L218-L235 this part and check the query logs again.
the trace below this being full?
Yes, for the parameters in this URL - http://publiclab.org/api/srch/nearbyPeople?nwlat=55.3791104480105&selat=23.40276490540795&nwlng=-128.67187500000003&selng=178.76953125 we are not passing any order direction so it goes to the default i.e else case - https://github.com/publiclab/plots2/blob/b70cbefa443ace9be1e83a5dde56a10e9603a135/app/services/search_service.rb#L231-L234
and I also tried printing the order_direction, it was DESC by default this is why we have ORDER BY created_at DESC
I tried commenting out the sorting part and the current code looks like this -
def tagNearbyPeople(coordinates, tag, field, period = nil, sort_by = nil, order_direction = nil, limit = 10)
raise("Must contain all four coordinates") if coordinates["nwlat"].nil?
raise("Must contain all four coordinates") if coordinates["nwlng"].nil?
raise("Must contain all four coordinates") if coordinates["selat"].nil?
raise("Must contain all four coordinates") if coordinates["selng"].nil?
raise("Must be a float") unless coordinates["nwlat"].is_a? Float
raise("Must be a float") unless coordinates["nwlng"].is_a? Float
raise("Must be a float") unless coordinates["selat"].is_a? Float
raise("Must be a float") unless coordinates["selng"].is_a? Float
raise("If 'from' is not null, must contain date") if period["from"] && !(period["from"].is_a? Date)
raise("If 'to' is not null, must contain date") if period["to"] && !(period["to"].is_a? Date)
user_locations = User.where('rusers.status <> 0')
.joins(:user_tags)
.where('user_tags.value LIKE ?', 'lat%')
.where('REPLACE(user_tags.value, "lat:", "") BETWEEN ' + coordinates["selat"].to_s + ' AND ' + coordinates["nwlat"].to_s)
.joins('INNER JOIN user_tags AS lontags ON lontags.uid = rusers.id')
.where('lontags.value LIKE ?', 'lon%')
.where('REPLACE(lontags.value, "lon:", "") BETWEEN ' + coordinates["nwlng"].to_s + ' AND ' + coordinates["selng"].to_s)
.distinct
if tag.present?
if field.present? && field == 'node_tag'
tids = Tag.where("term_data.name = ?", tag).collect(&:tid).uniq || []
uids = TagSelection.where('tag_selections.tid IN (?)', tids).collect(&:user_id).uniq || []
else
uids = User.where('rusers.status <> 0')
.joins(:user_tags)
.where('user_tags.value = ?', tag)
.where(id: user_locations.select("rusers.id"))
.limit(limit)
.collect(&:id).uniq || []
end
user_locations = user_locations.where('rusers.id IN (?)', uids).distinct
end
items = user_locations
# selects the items whose node_tags don't have the location:blurred tag
items = items.where('user_tags.value <> "location:blurred"')
# Here we use period["from"] and period["to"] in the query only if they have been specified,
# so we avoid to join revision table
if !period["from"].nil? || !period["to"].nil?
items = items.joins(:revisions).where("node_revisions.status = 1")\
.distinct
items = items.where('node_revisions.timestamp > ' + period["from"].to_time.to_i.to_s) unless period["from"].nil?
items = items.where('node_revisions.timestamp < ' + period["to"].to_time.to_i.to_s) unless period["to"].nil?
# ----------------------Modified Code-----------------------------------#
# # # # added else to return the result and limit to limit the result
else
items.order('created_at DESC').limit(limit)
end
# sort users by their recent activities if the sort_by==recent
# if sort_by == "recent"
# items.joins(:revisions).where("node_revisions.status = 1")\
# .order("node_revisions.timestamp #{order_direction}")
# .distinct
# elsif sort_by == "content"
# ids = items.collect(&:id).uniq || []
# User.select('`rusers`.*, count(`node`.uid) AS ord')
# .joins(:node)
# .where('rusers.id IN (?)', ids)
# .group('`node`.`uid`')
# .order("ord #{order_direction}")
# else
# items.order("created_at #{order_direction}")
# .limit(limit)
# end
# ---------------------- End of Modified Code-----------------------------------#
end
I added the else block because it wasn't returning any data as it didn't go to the if case and we didn't have a default case. Also, without the limit query it resulted all the users from the resulting query (which were too many) so took a really long time (because of the automatic 4 queries on each user). Also, after limiting, resulting users were different from previous ones because we don't have any sorting so the results were in random order
Following is the full query stack trace of the above code-
User Load (7.9ms) SELECT DISTINCT `rusers`.* FROM `rusers` INNER JOIN `user_tags` ON
`user_tags`.`uid` = `rusers`.`id` INNER JOIN user_tags AS lontags ON lontags.uid = rusers.id
WHERE (rusers.status <> 0) AND (user_tags.value LIKE 'lat%') AND (REPLACE(user_tags.value,
"lat:", "") BETWEEN 23.40276490540795 AND 55.3791104480105) AND (lontags.value LIKE 'lon%')
AND (REPLACE(lontags.value, "lon:", "") BETWEEN -128.67187500000003 AND 178.76953125) AND
(user_tags.value <> "location:blurred") LIMIT 5
UserTag Load (38.0ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2000 AND (value LIKE 'lat:%')
UserTag Load (45.2ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2000 AND (value LIKE 'lon:%')
UserTag Exists (0.4ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 2000 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (41.1ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2000 AND (value LIKE 'location:%')
UserTag Load (43.9ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2001 AND (value LIKE 'lat:%')
UserTag Load (33.7ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2001 AND (value LIKE 'lon:%')
UserTag Exists (1.1ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 2001 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (47.0ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2001 AND (value LIKE 'location:%')
UserTag Load (38.3ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2002 AND (value LIKE 'lat:%')
UserTag Load (43.2ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2002 AND (value LIKE 'lon:%')
UserTag Exists (0.7ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 2002 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (38.1ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2002 AND (value LIKE 'location:%')
UserTag Load (34.9ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2003 AND (value LIKE 'lat:%')
UserTag Load (37.4ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2003 AND (value LIKE 'lon:%')
UserTag Exists (0.9ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 2003 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (47.5ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2003 AND (value LIKE 'location:%')
UserTag Load (48.0ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2004 AND (value LIKE 'lat:%')
UserTag Load (37.9ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2004 AND (value LIKE 'lon:%')
UserTag Exists (0.8ms) SELECT 1 AS one FROM `user_tags` WHERE `user_tags`.`uid` = 2004 AND (value LIKE 'location:%') LIMIT 1
UserTag Load (43.5ms) SELECT `user_tags`.* FROM `user_tags` WHERE `user_tags`.`uid` = 2004 AND (value LIKE 'location:%')
Ohh. Thank You. That makes sense. I will try to look into it once more.
In our maps we are now loading data from
https://publiclab.org/api/srch/nearbyPeople?nwlat=10.0&selat=0.0&nwlng=0.0&selng=10.0
This code is located here: https://github.com/publiclab/plots2/blob/master/app/api/srch/search.rb#L333Accessing this API takes a lot longer than it should. For example, it takes 19.43s to load:
Meanwhile another API we use in our map layers has far more results in only 83ms!
Some of our other Search API requests perform much better than nearbyPeople. Why is that particular one so slow? In order for our maps to be useful we can't have such a long delay in loading every time we refresh or move the map.