leanprover-community / lean-client-python

Python talking to the Lean theorem prover
Apache License 2.0
39 stars 6 forks source link

Make the trio example work even if cwd is not examples/. #16

Closed Julian closed 1 year ago

Julian commented 3 years ago

This is only work 'in theory', not in practice, since there are some other things that don't work after this, but will fix those in a separate PR in a moment.

jasonrute commented 3 years ago

I need to look at this closer, but what is the reason for the changes to paths in trio_server.py? What does it do?

Julian commented 3 years ago

They let you pass pathlib.Path objects in rather than needing to convert them yourself to str essentially.

The example file was already dealing in Path objects, so yeah handing those directly into the trio server saves work for the caller. str paths would also continue to work, since os.fspath returns them unchanged, so yeah this basically just saves folks using more "modern" ways to handle paths from having to convert them.

Lemme know if that makes sense.

Julian commented 3 years ago

Uh though if you meant why does the parameter name need to change, it doesn't really, though path seemed to suit it more if it can be any relative/absolute path (which is what this overall change does), but yeah let me know if you like the previous name obviously.

jasonrute commented 3 years ago

As for the path stuff, first the "filename" in the lean interface doesn't have to be a real filepath if content is non-empty, so we can't and shouldn't require that it is. For example, there are plenty of legitimate reasons to put a non-existent dummy file as the filename, e.g. trio_server.sync("dummy.lean", content="#eval 1+1") should work fine even if dummy.lean isn't a real file. There is a lot of interest in programmatically running lean through the lean server for machine learning and other use cases. (For example, see here for some working code. It doesn't yet use the python server, but it should.)

Moreover, my vision (I don't know if @PatrickMassot agrees) for the Trio interface is that it is just a light weight front end for the lean server and nothing more. For what it's worth, I think in the end we should remove the state method entirely and force users to use the send command. The reason is that I don't want to support every little use case of the Lean server. I do think we should keep sync however, since it is a bit tricky to get right (but again, filename should just be a string since that is all Lean wants of a filename.)

On the other hand, I do think that having better (and properly working :smile:) examples would be a great thing. Those examples could have higher level interfaces which properly support files and other things. For example, on this branch I started to build a tool for Johan Commelin which searches through all of mathlib for all strings of some form, say "subtype" and then uses the lean server to check if they are instances of a particular definition, say "submodule.subtype", so that they can be replace with another string. We could make something like that a formal example.

jasonrute commented 3 years ago

As for this particular example, I like how simple was before, but I agree that working directories make path resolution a pain.

Julian commented 3 years ago

As for the path stuff, first the "filename" in the lean interface doesn't have to be a real filepath if content is non-empty, so we can't and shouldn't require that it is. For example, there are plenty of legitimate reasons to put a non-existent dummy file as the filename, e.g. trio_server.sync("dummy.lean", content="#eval 1+1") should work fine even if dummy.lean isn't a real file. There is a lot of interest in programmatically running lean through the lean server for machine learning and other use cases. (For example, see here for some working code. It doesn't yet use the python server, but it should.)

This makes sense certainly (though I don't think anything is changing there in this PR, so possibly I'm missing the point, or didn't explain clearly myself).

Moreover, my vision (I don't know if @PatrickMassot agrees) for the Trio interface is that it is just a light weight front end for the lean server and nothing more. For what it's worth, I think in the end we should remove the state method entirely and force users to use the send command. The reason is that I don't want to support every little use case of the Lean server. I do think we should keep sync however, since it is a bit tricky to get right (but again, filename should just be a string since that is all Lean wants of a filename.)

I don't know enough really about this library yet to know / have an opinion on extra methods in the trio layer, I literally just wanted to run the example and noticed it didn't work the first times I ran it and dug quickly into why. Certainly I agree that a client should in general separate it's IO (trio) from its logic, so I 100% agree with your first sentence in principle.

So yeah I think I agree with everything you said personally :)

jasonrute commented 3 years ago

Sorry, I did a poor job explaining myself. First, the issue you found is definitely going to cause confusion and I agree it should be fixed. Working off your fix, I suggest the following simple fix:

#!/usr/bin/env python
from pathlib import Path

import trio # type: ignore

from lean_client.trio_server import TrioLeanServer

EXAMPLES_DIR = Path(__file__).parent

async def main():
    lean_file = EXAMPLES_DIR / 'test.lean'
    lines = lean_file.read_text().split('\n')

    async with trio.open_nursery() as nursery:
        server = TrioLeanServer(nursery, debug=False)
        await server.start()
        await server.full_sync(str(lean_file))

        for i, line in enumerate(lines):
            before = await server.state(str(lean_file), i+1, 0)
            after = await server.state(str(lean_file), i+1, len(line))
            if before or after:
                print(f'Line {i+1}: {line}')
                print(f'State before:\n{before}\n')
                print(f'State after:\n{after}\n')

        server.kill()
        nursery.cancel_scope.cancel()

if __name__ == '__main__':
    trio.run(main)

Now, besides some minor things, the largest difference between our two approaches is what we input to server.state and server.full_sync. In both cases, I think we should just input a string instead of a path-like object for a few technical reasons I'll explain. (One could use os.fspath in the example above instead of str, but I think they do the same thing and str is easier to understand.)

Now, why am I insisting on string as the input type for full_sync and state? It is because, not only does the file path matter, but the exact string of the path matters as well. Here is a MWE bad example:

import os
from pathlib import Path

import trio # type: ignore

from lean_client.trio_server import TrioLeanServer

EXAMPLES_DIR = Path(__file__).parent

async def main():
    lean_file1 = EXAMPLES_DIR / 'test.lean'  # relative path, e.g. examples/test.lean
    print(os.fspath(lean_file1))
    lean_file2 = lean_file1.resolve() # full path, e.g. /Users/bobsmith/lean-client-python/examples/test.lean
    print(os.fspath(lean_file2))
    lines = lean_file1.read_text().split('\n')

    async with trio.open_nursery() as nursery:
        server = TrioLeanServer(nursery, debug=False)
        await server.start()
        await server.full_sync(os.fspath(lean_file1))

        for i, line in enumerate(lines):
            before = await server.state(os.fspath(lean_file2), i+1, 0)
            after = await server.state(os.fspath(lean_file2), i+1, len(line))
            if before or after:
                print(f'Line {i+1}: {line}')
                print(f'State before:\n{before}\n')
                print(f'State after:\n{after}\n')

        server.kill()
        nursery.cancel_scope.cancel()

if __name__ == '__main__':
    trio.run(main)

Even though file1 and file2 resolve to the same location, lean doesn't treat them the same, so the state methods in this example come up empty. Moreover, we can't easily make the lean trio server work with paths since (1) we have to support dummy files which don't exist, and (2) we have to support all the lean commands many of which take a filename string. Hence I think we should only support filename strings and force the user to do all the path stuff on their end.

Julian commented 3 years ago

Will read more carefully in a bit, and yeah I still don't really claim to have enough knowledge of the lib to have a strong opinion, so I'm OK going with whatever certainly, but:

It is because, not only does the file path matter, but the exact string of the path matters as well.

This to me somewhat says perhaps then that the library should just always deal in absolute paths if it wants to abstract over this being the case / prevent potential user error, or if not, that then users just shouldn't do that, but more importantly just to be sure:

(1) we have to support dummy files which don't exist

There's nothing wrong with a pathlib.Path that doesn't exist, to be clear, right? A path doesn't need to exist, same as any other current notion of an opaque-ID-that-happens-to-look-like-a-file-path-and-is-represented-by-a-string.

But yeah it's probably not very important, so if you're fairly confident I am more than happy to go with your way.