Open bashtage opened 5 hours ago
On main I'm still seeing failures in the code below. Having this work correctly on terms like C(var)[T.value] is practically an important case since C() is common when [T.value] is present. The tests only look at cases like A[T.value].
C(var)[T.value]
C()
[T.value]
A[T.value]
import formulaic constraints = "C(agecat)[T.4] = C(agecat)[T.5]" variable_names = [ "Intercept", "C(agecat)[T.2]", "C(agecat)[T.3]", "C(agecat)[T.4]", "C(agecat)[T.5]", "logpyears", "smokes", ] lc = formulaic.utils.constraints.LinearConstraints.from_spec( constraints, variable_names=variable_names )
Exception is
--------------------------------------------------------------------------- FormulaSyntaxError Traceback (most recent call last) Cell In[2], line 14 3 constraints = "C(agecat)[T.4] = C(agecat)[T.5]" 4 variable_names = [ 5 "Intercept", 6 "C(agecat)[T.2]", (...) 11 "smokes", 12 ] ---> 14 lc = formulaic.utils.constraints.LinearConstraints.from_spec( 15 constraints, variable_names=variable_names 16 ) File C:\git\formulaic\formulaic\utils\constraints.py:102, in LinearConstraints.from_spec(cls, spec, variable_names) 98 spec = ",".join(spec) 99 if isinstance(spec, str): 100 matrix, values = LinearConstraintParser( 101 variable_names=variable_names --> 102 ).get_matrix(spec) 103 return cls(matrix, values, variable_names) 104 matrices, constants = [], [] File C:\git\formulaic\formulaic\utils\constraints.py:277, in LinearConstraintParser.get_matrix(self, formula) 263 def get_matrix( 264 self, formula: str 265 ) -> Tuple["numpy.typing.ArrayLike", "numpy.typing.ArrayLike"]: 266 """ 267 Build the constraint matrix and constraint values vector associated with 268 the parsed string. (...) 275 A tuple of the contraint matrix and constraint values respectively. 276 """ --> 277 constraints = self.get_terms(formula) 278 if not constraints: 279 return numpy.empty((0, len(self.variable_names))), numpy.array([]) File C:\git\formulaic\formulaic\utils\constraints.py:255, in LinearConstraintParser.get_terms(self, formula) 246 def get_terms( 247 self, formula: str 248 ) -> Union[None, List[ScaledFactor], Tuple[List[ScaledFactor], ...]]: 249 """ 250 Build the `ScaledFactor` instances for a constraint formula string. 251 252 Args: 253 formula: The constraint formula for which to build terms. 254 """ --> 255 ast = self.get_ast(formula) 256 if not ast: 257 return None File C:\git\formulaic\formulaic\utils\constraints.py:240, in LinearConstraintParser.get_ast(self, formula) 230 def get_ast(self, formula: str) -> Optional[ASTNode]: 231 """ 232 Assemble an abstract syntax tree for the nominated `formula` string. 233 (...) 236 generated. 237 """ 238 return cast( 239 Optional[ASTNode], --> 240 tokens_to_ast( 241 self.get_tokens(formula), 242 operator_resolver=self.operator_resolver, 243 ), 244 ) File C:\git\formulaic\formulaic\parser\algos\tokens_to_ast.py:137, in tokens_to_ast(tokens, operator_resolver) 135 if output_queue: 136 if len(output_queue) > 1: --> 137 raise exc_for_missing_operator(output_queue[0], output_queue[1]) 138 return output_queue[0] 140 return None FormulaSyntaxError: Missing operator between `C(agecat)` and `T.4`. ⧛C(agecat)[T.4⧚] = C(agecat)[T.5]
Sorry to report that the previous patch didn't quite get to the full set of issues.
On main I'm still seeing failures in the code below. Having this work correctly on terms like
C(var)[T.value]
is practically an important case sinceC()
is common when[T.value]
is present. The tests only look at cases likeA[T.value]
.Exception is