hubzero / hubzero-cms

Platform for Scientific Collaboration
https://hubzero.org
GNU General Public License v2.0
47 stars 57 forks source link

[NCN-806] Forbid guest user access to tag search in resources browse #1705

Open jsperhac opened 5 months ago

jsperhac commented 5 months ago

Summary

The changes found here forbid guest users from tag searching in the resources browse page of a Hub.

Specifics:

The screenshot below shows the the "Popular Tags" section shown to logged-in users. It also shows the tag search results when a logged-in user has clicked on tags or added them to a URL. Guest users will be unable to see the "Popular Tags" or to use them for tag searching.

resources-browse-tag-search-ui

Motivation

This PR addresses Jira card NCN-806.

Tag search in Nanohub's resources browse page was being abused by bots, even though there is a search limit of 5 tags enforced in the resources controller and browse UI.

Symptoms

On smaller hubs this would not be as much of an issue, but determining the top tags is implemented with an equijoin of large tables. What were essentially DDoS attacks were resulting in repeated execution of the equijoin, which slowed the database to a crawl.

Verifying that the database was bogged down by an equijoin while searches limited to 5 tags were issued at a rapid-fire rate was done by watching nanohub-access.log and mysql-slow.log during the bot attack.

Database query as found in mysql-slow.log:

# User@Host: nanohub[nanohub] @ localhost []
# Thread_id: 26713691  Schema: nanohub  QC_hit: No
# Query_time: 5.766762  Lock_time: 0.000041  Rows_sent: 20  Rows_examined: 132208
SET timestamp=1710445911;
SELECT jos_tags_object.label,jos_tags.*
FROM `jos_tags`
INNER JOIN jos_tags_object ON jos_tags_object.tagid = jos_tags.id
WHERE `jos_tags_object`.`tbl` = 'resources'
AND `jos_tags`.`admin` NOT IN (1)
GROUP BY jos_tags_object.label,jos_tags.id
ORDER BY `objects` DESC
LIMIT 20;

Informal benchmarking of the query showed it took about 750ms on Nanohub, down to about 550ms if the GROUP BY is removed and UNIQUE(jos_tags_object.label) used instead. However, the query originates in an ORM model and changing such code can be complicated and have unintended effects.

Another approach could be made by changing how top tags are determined, perhaps using a list cached daily.

Testing

This problem was investigated and the fix developed and tested on an AWS instance. Tag searches were run on the resources/browse page for logged in and guest users. Tag searches were also run by specifying an appropriate URL, such as:

https://jsperhac.aws.hubzero.org/resources/browse?tag=c,d,e,f,g,h,i

Note that the changes made to the resources controller prevent the URL-only search from being executed by a guest user.

The code that limits tag searches to 5 tags was reviewed carefully and tested both by URL and UI. It does indeed work.

Benchmarking the offending query is described in "Symptoms" above.

Special Nanohub notes

The fix shown for default.php must be hotfixed in the com_resources default page found in the custom NaN template.

A request: While we are at it, can we please add that NaN template to Gitlab? It's not presently there. An outdated template called 'tpl_NaN' is not the same thing! :slightly_frowning_face:

Next steps

Note that a very similar tag search is found in the browse page for com_publications. Users of that page are not constrained to 5 tags. It would be wise to:

It should also be noted that these resource browse interfaces search entirely using SQL against the Hub database. They are complex and won't be easily upgraded by dropping in a few line change to use the Solr backend. Something to think about.

PR Checklist