progressivetech / net.ourpowerbase.sumfields

The Summary Fields extension creates read-only custom data fields that extend Contact and are automatically populated with up-to-date totals such as total lifetime contributions, last contribution amount, last attended event, etc.
Other
8 stars 29 forks source link

DB Error: division by zero #67

Closed martinh-pw closed 4 years ago

martinh-pw commented 4 years ago

Getting a db error that causes the operation to hang for deleting a participant record. In this case, the record was already set to status "cancelled" and I believe the summary fields were already showing 0 for several fields. Sorry, I can't remember which ones and don't know if I can see that easily again... This was a newly created contact and had no previous participations or contributions.

My assumption is that this was caused by the civicrm_participant_after_delete trigger in our mysql database. The extension currently was at version 4.0.2, however I don't know what version it would have been when originally installed.

In the current trigger, I see 4 division operations, and each one is using IFNULL in the denominator. I'm not an expert in dbs but this seems not ideal to me, since each of these will fail over to 0 if null, and cause this db error. But maybe I'm misunderstanding this, or there's a reason for that design?

I wonder if this is a typo and these functions in are meant to have been NULLIF rather than IFNULL??

jmcclelland commented 4 years ago

I imagine I added the NULLIF statements to avoid having NULL as the denominator. I suspect I was running a version of MySQL that threw an error with NULL in the denomintor but did not return an error if 0 was in the denominator.

At the moment I'm running MariaDB 10.3.17 which shows a non-fatal warning with 0 and has no complaint with NULL, so maybe NULL is a better choice.

MariaDB [ptp]> SELECT 1 / 0;
+-------+
| 1 / 0 |
+-------+
|  NULL |
+-------+
1 row in set, 1 warning (0.001 sec)

MariaDB [ptp]> SELECT 1 / NULL;
+----------+
| 1 / NULL |
+----------+
|     NULL |
+----------+
1 row in set (0.001 sec)

MariaDB [ptp]>

Does your database server throw an error with either of those two statements?

Do you have a copy of the exact DB error that was thrown (perhaps in ConfigAndLog)?

martinh-pw commented 4 years ago

This is the error in the civi log, unfortunately it doesn't really say much:

Nov 05 10:12:32  [info] $Fatal Error Details = Array
(
    [callback] => Array
        (
            [0] => CRM_Core_Error
            [1] => handle
        )
    [code] => -13
    [message] => DB Error: division by zero
    [mode] => 16
    [debug_info] => DELETE FROM civicrm_participant  WHERE (  civicrm_participant.id = 22913 )  [nativecode=1365 ** Division by 0]
    [type] => DB_Error
    [user_info] => DELETE FROM civicrm_participant  WHERE (  civicrm_participant.id = 22913 )  [nativecode=1365 ** Division by 0]
    [to_string] => [db_error: message="DB Error: division by zero" code=-13 mode=callback callback=CRM_Core_Error::handle prefix="" info="DELETE FROM civicrm_participant  WHERE (  civicrm_participant.id = 22913 )  [nativecode=1365 ** Division by 0]"]
)

Running those 2 queries I got a result of null for both, but the 1/0 query also threw this warning, so same as you:

+------+
| 1/0  |
+------+
| NULL |
+------+
1 row in set, 1 warning (0.00 sec)

Warning (Code 1365): Division by 0

It's not entirely clear to me if this extensions was causing the civi error, but this was the only trigger that I saw in the database for this scenario, so I'm not sure what else could be causing it I guess?

jmcclelland commented 4 years ago

Ah - I just replicated it. I see - I think it's possible I've never deleted a participant before. The error message is confusing because MySQL shows you the SQL statement executed, but not the triggers that happened as a result of the SQL statement.

Despite the lack of clear evidence, I think your theory is correct.

Unfortunately, the fix is really complicated. I think we need to addional logic to an already overloaded SQL statement. Right now it's:

(SELECT FORMAT(IFNULL(' . $event_noshow_trigger_sql . ', 0)' . ' / ' .  'IFNULL(' . $event_total_trigger_sql . ', 0), 2) * 100 AS summary_value)'

To this complicated array of INULLS, we need to add a new one. If $event_total_trigger_sql is NULL, then we should not try to divide anything and just return .00.

As an immediate work-around I'd suggest unchecking both the "Events attended as percent" and "Events no-show as percent" fields and at least it should stop throwing the errors.

martinh-pw commented 4 years ago

Thanks for the extra investigation. FYI in this particular case we found that this extension wasn't needed at the current time so we just disabled it. But good to know that workaround. :)

It might not be that bad to change, what about this? Haven't tested but might work. Might still require to add an IFNULL again though (around the whole CASE maybe).

'(SELECT FORMAT(
  CASE WHEN '. $event_total_trigger_sql . ' != 0 THEN
    IFNULL(' . $event_noshow_trigger_sql . ', 0)' . ' / ' . $event_total_trigger_sql
  ELSE 0, 
2) * 100 AS summary_value)'
jmcclelland commented 4 years ago

I found slightly simpler fix - I ensure the total event count returns NULL instead of 0, then coerce NULL into 1 rather than zero. This way we always get 1 as the denomonitor. Since the numerator will always be zero if there are no events, it really doesn't matter what the denominator is - we'll always get 0 percent.

jmcclelland commented 4 years ago

For the record, anyone running mariadb 10.2 will likely see the default behavior when dividing by zero change. I'm not sure why I couldn't repliace in 10.3 though. Maybe debian changed it back?? In any event, this should be fixed now.