MathArrays behave badly with numpy's numeric types

ChristopherChudzicki commented 6 years ago

Behold, my nightmare:

from mitxgraders.helpers.calc import evaluator
variables = {'v': MathArray([3, 4])}
evaluator('v*v + v') # scalar + vector,
                     # but does not throw error

Explanation:

import numpy as np
from mitxgraders.helpers.calc import MathArray

x = MathArray([1, 2, 3])

1.0 + x                 # throws an error, good!
np.float64(1.0) + x     # returns MathArray([2.0, 3.0, 4.0])

Here's what's happening:

1.0 + x:
- tries to call (1.0).__add__(self, x) but fails, because builtin float type does not know how to handle MathArray
- next tries x.__radd__(self, 1.0), which knows that it should through a StudentFacingError.
np.float64(1.0) + x:
- tries to call np.float64(1.0).__add__(self, x) and succeeds: Numpy's float64 numeric type knows how to add ndarrays to itself.
- I.e., this version never calls any of MathArray's methods

I'm guessing this is true for all numeric operators and true for all numpy numeric types. Bummer.

Note: np.float64.__add__ is implemented in C. So even if we wanted to monkey-patch it, I don't think we could.

Goals:

Resolve this issue (insofar as it could ever affect students)
continue to keep most of the type/shape checking outside of calc.py

ChristopherChudzicki commented 6 years ago

Suggested Fix: Every place that FormulaParser does a calculation (i.e., eval_sum, eval_prod, eval_function, etc) convert the result to a Python numeric type.

So for example, eval_sum might look like this:

def standardize_if_numeric(obj):
    """If obj is numpy numeric type, convert to python builtin, otherwise leave alone"""
    pass

def eval_sum(self, parse_result):
      """Add/subtract inputs"""
      data =  parse_result[1:] if parse_results[0] == '+' else parse_results[:]
      while data:
          op = data.pop(0)
          num = standardize_if_numeric(data.pop(0))
          if op == '+':
              result += num
          elif op == '-':
              result -= num
          else:
              raise CalcError("Unexpected symbol {} in eval_sum".format(op))
      return result

@jolyon If you have time, let me know what you think. If the above fix sounds OK to you, I'd like to resolve it before I go to Ireland.

Alternatively, we could just wait 2 years for edX to upgrade Numpy: np.float64.__add__ is a numpy ufunc, and Numpy v1.13 added a way for custom classes to override ufunc behavior (magic __array_ufunc__ method).

jolyonb commented 6 years ago

I think I've a better idea. Put this in handle_node -- then we only need to do it once. Note that because of the vector triple product issue with parentheses, any shape-converting operations must be handed back to handle_node. For good measure, we might want to also put it in the routine that takes row x column matrices and turns that into a scalar too?

ChristopherChudzicki commented 6 years ago

Put this in handle_node

Yes, better idea.

For good measure, we might want to also put it in the routine that takes row x column matrices and turns that into a scalar too?

This already happens at https://github.com/mitodl/mitx-grading-library/blob/master/mitxgraders/helpers/calc/math_array.py#L204 .

mitodl / mitx-grading-library

MathArrays behave badly with numpy's numeric types #124

Explanation:

Goals: