photo / frontend

The official @github repository of the Trovebox frontend software. A photo sharing and photo management web interface for data stored "in the cloud" (i.e. Amazon S3, Rackspace CloudFiles, Google Storage).
https://trovebox.com
Apache License 2.0
1.37k stars 244 forks source link

MySql error when querying albums #1473

Open jmathai opened 10 years ago

jmathai commented 10 years ago

Reported by @ashwingj on the openphoto mailing list.

Branch used: 4.0.2-rc1 Latest Commit: latest commit b00cf3637b

Problem:

On completing a fresh install of openphoto using the version mentioned above, the browser is directed to /photos/upload?m=welcome and receives an error page (HTTP 500). This is what the error log said

[Wed Apr 30 09:44:48.200364 2014] [:error] [pid 1443] [client 192.168.1.22:49834] {severity:warn, description:"Uncaught exception (/home/zz/openphotofrontend/src/libraries/external/epi/EpiException.php:13): Query error: SQLSTATE[42000]: Syntax error or access violation: 1140 Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause - SELECT COUNT(*) FROM album WHERE owner=:owner ORDER BY name", additional:}, referer: http://photos/setup/3

[Wed Apr 30 09:44:55.274949 2014] [:error] [pid 1433] [client 192.168.1.22:49837] {severity:warn, description:"Uncaught exception (/home/zz/openphotofrontend/src/libraries/external/epi/EpiException.php:13): Query error: SQLSTATE[42000]: Syntax error or access violation: 1140 Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause - SELECT COUNT(*) FROM album WHERE owner=:owner ORDER BY name", additional:}, referer: http://photos/photos/upload?m=welcome

[Wed Apr 30 09:44:55.337941 2014] [:error] [pid 1435] [client 192.168.1.22:49838] {severity:warn, description:"Uncaught exception (/home/zz/openphotofrontend/src/libraries/external/epi/EpiException.php:13): Query error: SQLSTATE[42000]: Syntax error or access violation: 1140 Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause - SELECT COUNT(*) FROM album WHERE owner=:owner ORDER BY name", additional:}, referer: http://photos/photos/upload?m=welcome

On investigating the error (a SQL query possibly missing a GROUP BY statement)

$albumsCount = $this->db->one("SELECT COUNT(*) FROM {$this->mySqlTablePrefix}album WHERE owner=:owner ORDER BY name", array(':owner' => $this->owner));

Was the intention to get the album count for the 'owner'? If so is the order by name relevant in the query?

jmathai commented 10 years ago

@ashwingj What version of MySql are you running? This query works for me.

vagrant@lucid32:~$ aptitude show mysql-server | grep Version
Version: 5.5.35-0ubuntu1

mysql> SELECT COUNT(*) FROM `album` WHERE `owner`='foo@example.com' ORDER BY name;
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
ashwingj commented 10 years ago

I am running mysql on my drobo NAS device.

mysql> select version(); +------------+ | version() | +------------+ | 5.6.13-log | +------------+ 1 row in set (0.00 sec)

Running the same SQL query via commandline ....

mysql> select count(*) from album where owner='foo@example.com' ORDER BY NAME; ERROR 1140 (42000): Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause

On reading up more about this error, I found out that a sql_mode called 'ONLY_FULL_GROUP_BY' may be the reason why some servers produce this error and why some don't.

http://dev.mysql.com/doc/refman/5.1/en/sql-mode.html#sqlmode_only_full_group_by

So i queried my servers settings

mysql> select @@sql_mode; +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ | @@sql_mode | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ONLY_FULL_GROUP_BY,NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

Normally this should only apply when using an aggregation function without a group by clause. However not sure why this applies in our case as count(*), i believe, is treated as an exception and if i take out the order by name i don't get the exception.

jmathai commented 10 years ago

@ashwingj Very interesting. @patricksan and I have been testing Trovebox out on NAS devices (with some additional special sauce, TBD). We haven't tried Drobo yet.

The ORDER BY isn't needed for the COUNT(*) query. The reason we have it included is to replicate the query for listing (immediately before) and counting.

While it doesn't need to be in there I'm apprehensive of veering away from replicating the query (minus (COUNT()). Removing the ORDER BY is an intermediate solution that doesn't have any side effects but let's brainstorm other approaches.

ashwingj commented 10 years ago

DISCLAIMER: I am not very familiar with mysql so I apologize if I am asking trivial questions.

What are the benefits of replicating the query? Are you trying to ensure that the database is trying to reuse the query plan and/or tmp tables for quicker execution? I assume that this is a query that is hit quite often on trovebox.com and that too by multiple users. If that is the case I understand your apprehension.

But if that is indeed one of the reasons, i believe indexing that table on 'owner' or for extreme scaling sharding on owner may alleviate some of those concerns. (no idea, yet, how either is achieved on mysql)

select * from album where owner=x order by name limit y,z
--the Order By will be the performance challenge even after indexing

select count(*) from album where owner=x
--Once the column is indexed the query plan would merely
--hit the index and not the table

If there are other considerations for replicating the query, i would love to learn ...

jmathai commented 10 years ago

@ashwingj Valid question. The tables should already be optimized for the query. The owner column is indexed on every table as it's always in the WHERE clause.

Originally we were using FOUND_ROWS() / SQL_CALC_FOUND_ROWS [1] which is the proper way to do this. We changed to a COUNT(*) believing it would be more performant [2].

We should revisit reverting to FOUND_ROWS().

The reason is simply to decrease the chances of someone introducing a regression because they make changes to one of the two queries. Unsure making them the same helps but that was/is the theory.

  1. https://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_found-rows
  2. http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/