ramonhagenaars / nptyping

💡 Type hints for Numpy and Pandas
MIT License
576 stars 29 forks source link

Express constraints for the first few channels and then specify ellipses? #83

Closed Erotemic closed 2 years ago

Erotemic commented 2 years ago

Unless I'm missing something there seems to be no way to encode the idea that an array has at least 2 dimensions with some size constraint, but then it may also contain any other trailing dimensions. For instance, I would have thought that I would say that I could have an array with at least 2 dimensions like this:

Shape['*,*,...'] but that seems to return an InvalidShapeError.

Looking at the grammar, this seems to make sense, the line shape_expression : dimensions | dimension "," ellipsis indicates that an ellipsis is only valid when there is one dimension.

If we modify the grammar such that we are using shape_expression : dimensions | dimensions "," ellipsis, it should be possible to express what I'm looking for, however in my prototyping I couldn't parse the expression with an LALR parser, I had to use Earley instead.

Here is my prototype I made with Lark:

import lark
SHAPE_GRAMMAR = (
    '''
    // https://github.com/ramonhagenaars/nptyping/blob/master/USERDOCS.md#Shape-expressions
    ?start: shape_expression

    shape_expression     :  dimensions | dimensions "," ellipsis
    dimensions           :  dimension | dimension "," dimensions
    dimension            :  unlabeled_dimension | labeled_dimension
    labeled_dimension    :  unlabeled_dimension " " label
    unlabeled_dimension  :  number | variable | wildcard | dimension_breakdown
    wildcard             :  "*"
    dimension_breakdown  :  "[" labels "]"
    labels               :  label | label "," labels
    label                :  lletter | lletter word
    variable             :  uletter | uletter word
    word                 :  letter | word underscore | word number
    letter               :  lletter | uletter
    uletter              :  "A"|"B"|"C"|"D"|"E"|"F"|"G"|"H"|"I"|"J"|"K"|"L"|"M"|"N"|"O"|"P"|"Q"|"R"|"S"|"T"|"U"|"V"|"W"|"X"|"Y"|"Z"
    lletter              :  "a"|"b"|"c"|"d"|"e"|"f"|"g"|"h"|"i"|"j"|"k"|"l"|"m"|"n"|"o"|"p"|"q"|"r"|"s"|"t"|"u"|"v"|"w"|"x"|"y"|"z"
    number               :  digit | number digit
    digit                :  "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
    underscore           :  "_"
    ellipsis             :  "..."
    ''')
# shape_parser = lark.Lark(SHAPE_GRAMMAR,  start='start', parser='lalr')
shape_parser = lark.Lark(SHAPE_GRAMMAR,  start='start', parser='earley')
print(shape_parser.parse('3').pretty())
print(shape_parser.parse('N,M').pretty())
print(shape_parser.parse('N,3').pretty())
print(shape_parser.parse('*,*').pretty())
print(shape_parser.parse('2,...').pretty())
print(shape_parser.parse('*,...').pretty())
print(shape_parser.parse('1,3,4,5,...').pretty())
print(shape_parser.parse('*,*,...').pretty())
ramonhagenaars commented 2 years ago

Hi @Erotemic. You are correct that this is not accepted by nptyping 2.1.3. I am in the process of adding it in a next minor release: https://github.com/ramonhagenaars/nptyping/pull/85. This includes an update to the BNF and making it "actually work" with instance checking.