ingolemo / python-lenses

A python lens library for manipulating deeply nested immutable structures
GNU General Public License v3.0
307 stars 19 forks source link

Debugging code involving lenses is hard #40

Open Gurkenglas opened 1 year ago

Gurkenglas commented 1 year ago

The call stacks end up super deep and the order of operations strange. Have you considered a refactor where lens.....get() is compiled into a function block where .F(getter) corresponds to string2 = getter(int5), .Recur(Foo) corresponds to for foo in recur(Foo, bar):, etc.? If the only reason against is months of tedious refactoring, say so - they might be the kind that AI tools these days solve.

ingolemo commented 1 year ago

I agree that the call stacks are monstrous. I'm not really sure how that refactor would work; could you give an example?

Gurkenglas commented 1 year ago

Here's ~what code of mine I changed since I posted this issue. This change made debugging much easier, so this should be happening automatically behind the scenes. (Yes, this is not lawful use of optics (it's fine if I only get nice debugging for lawful uses) and the code blocks are unequal and both wrong.)

def shrinkAttr(self, attr, regex):
    def asserter(x):
        assert re.search(regex, x.__dict__[attr])
        del x.__dict__[attr]
    return self & lenses.Iso(asserter, lambda x: x)
lenses.ui.BaseUiLens.shrinkAttr = shrinkAttr

def shrink(x):
    return lenses.bind(x).Recur(openai.OpenAIObject
        ).shrinkAttr("api_base_override", "None"
        ).shrinkAttr("api_key", "sk-\w{48}"
        ).shrinkAttr("api_type", "None"
        ).shrinkAttr("api_version", "None"
        ).shrinkAttr("openai_id", "chatcmpl-\w{29}"
        ).shrinkAttr("organization", "user-\w{24}"
        ).shrinkAttr("typed_api_type", ".*"
        ).shrinkAttr("id", "chatcmpl-\w{29}"
        ).shrinkAttr("object", "chat_completion"
        ).shrinkAttr("created", "\d*"
        ).shrinkAttr("model", "gpt-4-0314"
        ).get()
def shrinkOne(x):
    def attr(name, regex):
        value = (str)(getattr(x,name))
        assert re.search(regex, value)
        delattr(x, name)
    attr("api_base_override", "None")
    attr("api_key", "sk-\w{48}")
    attr("api_type", "None")
    attr("api_version", "None")
    attr("openai_id", "chatcmpl-\w{29}")
    attr("organization", "user-\w{24}")
    attr("typed_api_type", ".*")
    attr("id", "chatcmpl-\w{29}")
    attr("object", "chat_completion")
    attr("created", "\d*")
    attr("model", "gpt-4-0314")
def shrink(openaiobject):
    d = openaiobject.to_dict()
    return lenses.bind(d).Recur(dict).modify(shrinkOne)
Gurkenglas commented 1 year ago

The OP example fleshed out: lens.Recur(Foo).GetAttr("id").F((str)).collect() could compile to:

def lensRecurFooGetAttrIdFStrCollect(arg : Bar) -> List<str>:
  collect : List<str> = []
  for recurFooArg : Foo in recur(Foo, arg):
    getAttrId : int = recurFooArg.id
    fStr : str = (str)(getAttrId)
    collect.append(fStr)
  return collect
ingolemo commented 1 year ago

Yes, but the problem is that I don't know how to get from here to there. All the functions in the call stack are there to provide the abstraction necessary to allow all the lenses tools to compose together. Maybe some of those layers can be eliminated with clever tricks, but short of dynamically generating python code and then exec-ing it, I don't think the library can get it that clean.

Or is that what you're proposing here: exec? That would definitely be a big refactor…

Gurkenglas commented 1 year ago

Yes, that's what I'm proposing! That your syntax is lens.Recur(Foo).GetAttr("id").F((str)).collect()(arg) can be read as a hint that the lens.Recur(Foo).GetAttr("id").F((str)).collect() part can be precomputed, and that function values don't come with "string that would produce that function" is in general silly :D

When I look at the type of a lens like lens.Recur(Foo).GetAttr("id").F((str)), a voice in my head tells me that something already does the work required to produce the readable codeblock, in order to compute a type that is an impoverished version of that same codeblock.

ingolemo commented 1 year ago

I thought lenses were confusing enough without trying to jam dynamic recompilation into the middle of everything.

I will admit that I find such a challenge tempting. I'll see if I can find time to do some experiments. No promises.

ingolemo commented 1 year ago

Fallen at the first hurdle. Python doesn't show the source code when printing tracebacks from dynamically generated code because it looks up the filename and reads the file. For example, if you run exec('1/0') then you'll notice that 1/0 doesn't appear anywhere in the output. I can't find any hooks to make this work. A large traceback that references real code is better than a short traceback with no information.

I'm open to any other ways to make the tracebacks easier to read if anyone has ideas.

Gurkenglas commented 1 year ago
code = """
def f(x):
    y = x+2
    print(5)
    1/0
    return y*x
"""
with open("/tmp/codef.py", "w") as f:
    f.write(code)
co = compile(code, "/tmp/codef.py", "exec")
exec(co)
f(2)

produces this in VSCode: Screenshot from 2023-04-16 18-38-53

ingolemo commented 1 year ago

I don't really want to write to a temporary file on every invocation of a lens.

Gurkenglas commented 1 year ago

See it as just-in-time compilation! Have you ever written code such that you couldn't cache the compiled version per calling code location with length_of_tmp_code in O(length_of_calling_code)?