iftechfoundation / ifdb

The software behind the Interactive Fiction Database (IFDB)
Other
23 stars 18 forks source link

Poll votecount includes sandboxed users' votes #405

Closed Keltena closed 1 month ago

Keltena commented 1 month ago

While poll pages hide votes from sandboxed users, they still count hidden votes towards each game's total. (For an example, compare listed vote count/rank vs. votes actually shown for this poll's top three games.)

Not sure why, but it looks like the count function that produces votecount (count(v2.gameid)) is including the sandboxed votes that the WHERE clause excludes. (Those rows are excluded from the final result; the count just seems to be executing before that somehow.)

dfabulich commented 1 month ago

Fixed in https://github.com/iftechfoundation/ifdb/pull/318

dfabulich commented 1 month ago

oops, similar but different

dfabulich commented 1 month ago

I spent time investigating this tonight, out of sheer curiosity.

On Slack, @Keltena asked what the deal is with the self-join to pollvotes v2 in this code… we were all kinda stumped. I am now unstumped.

MariaDB [ifdb]> select gameid, userid, votedate from pollvotes where pollid='1h1zv1nreilkxmw3';
+------------------+------------------+---------------------+
| gameid           | userid           | votedate            |
+------------------+------------------+---------------------+
| aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 |
| op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 |
| aearuuxv83plclpl | 0000000000000001 | 2024-08-20 00:40:11 |
+------------------+------------------+---------------------+
3 rows in set (0.001 sec)

We start with a poll with three votes. The pollvotes table doesn't have a primary key (whyyyy??) so we'll tell them apart by date.

  1. The first vote is by ifdbadmin (...000), for Counterfeit Monkey (aearuuxv83plclpl), at 00:39:42
  2. The second vote is also by ifdbadmin, for Anchorhead (op0uw1gn1tjqmjt7), at 00:39:48
  3. The third vote is by test@ifdb.org, for Counterfeit Monkey again, at 00:40:11
MariaDB [ifdb]> select v.gameid, v.userid, v.votedate,
v2.gameid as gameid2, v2.userid as userid2, v2.votedate as votedate2
from pollvotes v
join pollvotes v2 on v2.pollid = v.pollid and v2.gameid = v.gameid
where v.pollid = '1h1zv1nreilkxmw3';
+------------------+------------------+---------------------+------------------+------------------+---------------------+
| gameid           | userid           | votedate            | gameid2          | userid2          | votedate2           |
+------------------+------------------+---------------------+------------------+------------------+---------------------+
| aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 |
| aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 | aearuuxv83plclpl | 0000000000000001 | 2024-08-20 00:40:11 |
| op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 | op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 |
| aearuuxv83plclpl | 0000000000000001 | 2024-08-20 00:40:11 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 |
| aearuuxv83plclpl | 0000000000000001 | 2024-08-20 00:40:11 | aearuuxv83plclpl | 0000000000000001 | 2024-08-20 00:40:11 |
+------------------+------------------+---------------------+------------------+------------------+---------------------+
5 rows in set (0.000 sec)

In the second query, we query for all of the votes in a "self-join" on v2.gameid = v.gameid, without grouping. There are five results.

The first vote, for Counterfeit Monkey, has two votes with the same gameid: the first and the third vote. The second vote, for Anchorhead, only matches with itself. The third vote, has two votes with the same gameid: the first and the third vote.

So, for each gameid, some number N people have voted for that gameid, and we'll expect N^2 rows with that gameid in the result. There are five rows because two people voted for Counterfeit Monkey (2^2 = 4) and one person voted for Anchorhead (1^2 = 1), 4 + 1 = 5.

MariaDB [ifdb]> select count(v2.gameid) as votecount,
v.gameid, v.userid, v.votedate,
v2.gameid as gameid2, v2.userid as userid2, v2.votedate as votedate2
from pollvotes v
join pollvotes v2 on v2.pollid = v.pollid and v2.gameid = v.gameid
where v.pollid = '1h1zv1nreilkxmw3'
group by v.gameid, v.userid;
+-----------+------------------+------------------+---------------------+------------------+------------------+---------------------+
| votecount | gameid           | userid           | votedate            | gameid2          | userid2          | votedate2           |
+-----------+------------------+------------------+---------------------+------------------+------------------+---------------------+
|         2 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 |
|         1 | op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 | op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 |
|         2 | aearuuxv83plclpl | 0000000000000001 | 2024-08-20 00:40:11 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 |
+-----------+------------------+------------------+---------------------+------------------+------------------+---------------------+
3 rows in set (0.002 sec)

In the third query, we group by v.gameid and v.userid, bringing us back to three rows with three unique v.votedate values. We can see why this is if we manually group the five rows by eye.

  1. The first row of the self-join represents the first vote at :42.
  2. The second row of the self-join gets grouped into the first row, because it has the same v.gameid and v.userid.
  3. The third row of the self-join represents the second vote at :48.
  4. The fourth row of the self-join represents the third vote at :11.
  5. The fifth row of the self-join gets grouped into the fourth row, because it has the same v.gameid and v.userid.

But in the grouped query, we can add a count(v2.gameid) as votecount. It's a count of how many votes were grouped for that v.gameid.

But votecount is a confusing idea here. What's being counted, exactly, and why? It's a little easier to see when you add in the order by clause and see how the code iterates through the results.

MariaDB [ifdb]> select count(v2.gameid) as votecount,
v.gameid, v.userid, v.votedate,
v2.gameid as gameid2, v2.userid as userid2, v2.votedate as votedate2
from pollvotes v
join pollvotes v2 on v2.pollid = v.pollid and v2.gameid = v.gameid
where v.pollid = '1h1zv1nreilkxmw3'
group by v.gameid, v.userid
order by votecount desc, v.gameid, v.votedate;
+-----------+------------------+------------------+---------------------+------------------+------------------+---------------------+
| votecount | gameid           | userid           | votedate            | gameid2          | userid2          | votedate2           |
+-----------+------------------+------------------+---------------------+------------------+------------------+---------------------+
|         2 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 |
|         2 | aearuuxv83plclpl | 0000000000000001 | 2024-08-20 00:40:11 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 |
|         1 | op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 | op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 |
+-----------+------------------+------------------+---------------------+------------------+------------------+---------------------+
3 rows in set (0.002 sec)

The code starts with $curGameID = '[start]'; and then says:

// if we're on to a new game, start the new game listing
if ($gameID != $curGameID) {
    // ... blah blah blah ...

    // show the basic game listing
    echo "<td><a href=\"viewgame?id=$gameID\">"
        . "<b>$gameTitle</b></a>, by $gameAuthor<br>"
        . "<i>$voteCount vote" . ($voteCount != 1 ? "s" : "")
        . "</i>";

    // ... blah blah blah ...
    // this is the current game now
    $curGameID = $gameID;
}

So, as we walk through the three rows of results:

  1. On the first row, it's a "new" game (Counterfeit Monkey), so we show the game, show the vote count (2), and then show the first vote.
  2. On the second row, it's the same game as before, so we just show the second vote.
  3. On the third row, it's a new game (Anchorhead), so we show the game, show the vote count (1), and then show its vote.

Now, finally, we can see why the sandbox exclusion isn't working quite right.

In this example, I marked test@ifdb.org as sandboxed. The code query includes this line: and u.Sandbox in (0, $sandbox), so we can run this query to match it:

MariaDB [ifdb]> select v.gameid, v.userid, v.votedate,
v2.gameid as gameid2, v2.userid as userid2, v2.votedate as votedate2
from pollvotes v
join pollvotes v2 on v2.pollid = v.pollid and v2.gameid = v.gameid
join users u on u.id = v.userid
where v.pollid = '1h1zv1nreilkxmw3' and u.Sandbox in (0,0);
+------------------+------------------+---------------------+------------------+------------------+---------------------+
| gameid           | userid           | votedate            | gameid2          | userid2          | votedate2           |
+------------------+------------------+---------------------+------------------+------------------+---------------------+
| aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 |
| aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 | aearuuxv83plclpl | 0000000000000001 | 2024-08-20 00:40:11 |
| op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 | op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 |
+------------------+------------------+---------------------+------------------+------------------+---------------------+
3 rows in set (0.003 sec)

Instead of five rows, we have three rows, dropping the last two rows of the previous ungrouped query, where the Test users appeared on the left-hand side. But we're not excluding the row where the Test user appears on the right-hand side.

So, when it comes time to group these three rows:

MariaDB [ifdb]> select count(v2.gameid) as votecount,
v.gameid, v.userid, v.votedate,
v2.gameid as gameid2, v2.userid as userid2, v2.votedate as votedate2
from pollvotes v
join pollvotes v2 on v2.pollid = v.pollid and v2.gameid = v.gameid
join users u on u.id = v.userid
where v.pollid = '1h1zv1nreilkxmw3' and u.Sandbox in (0,0)
group by v.gameid, v.userid;
+-----------+------------------+------------------+---------------------+------------------+------------------+---------------------+
| votecount | gameid           | userid           | votedate            | gameid2          | userid2          | votedate2           |
+-----------+------------------+------------------+---------------------+------------------+------------------+---------------------+
|         2 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 |
|         1 | op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 | op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 |
+-----------+------------------+------------------+---------------------+------------------+------------------+---------------------+

We group the second row into the first row, yielding votecount 2. No sandboxed votes appear on the left-hand side, but they got counted in the votecount.

So, what's the fix? We have to exclude sandboxed votes from the v2 query.

MariaDB [ifdb] > select count(v2.gameid) as votecount,
v.gameid, v.userid, v.votedate,
v2.gameid as gameid2, v2.userid as userid2, v2.votedate as votedate2
from pollvotes v
join pollvotes v2 on v2.pollid = v.pollid and v2.gameid = v.gameid
join users u on u.id = v.userid
join users u2 on u2.id = v2.userid
where v.pollid = '1h1zv1nreilkxmw3'
and u.Sandbox in (0,0)
and u2.Sandbox in (0, 0)
group by v.gameid, v.userid;
+-----------+------------------+------------------+---------------------+------------------+------------------+---------------------+
| votecount | gameid           | userid           | votedate            | gameid2          | userid2          | votedate2           |
+-----------+------------------+------------------+---------------------+------------------+------------------+---------------------+
|         1 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 | aearuuxv83plclpl | 0000000000000000 | 2024-08-20 00:39:42 |
|         1 | op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 | op0uw1gn1tjqmjt7 | 0000000000000000 | 2024-08-20 00:39:48 |
+-----------+------------------+------------------+---------------------+------------------+------------------+---------------------+
2 rows in set (0.001 sec)

That's better!