sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.3k stars 449 forks source link

CachedFunction file location seems problematic #15184

Closed nbruin closed 9 years ago

nbruin commented 10 years ago

There is a problem where cached function report problematic file information. Consider

sage: M=IntegerModRing(7)
sage: edit(M.is_finite) #no problem
sage: edit(M.is_field)
IOError: [Errno 2] No such file or directory: '/sage/rings/finite_rings/integer_mod_ring.py'

Indeed sage doesn't find the appropriate source file:

sage: from sage.misc.sageinspect import sage_getfile
sage: sage_getfile(M.is_finite)
'/usr/local/sage/5.10/local/lib/python2.7/site-packages/sage/rings/finite_rings/integer_mod_ring.py'
sage: sage_getfile(M.is_field)
'/sage/rings/finite_rings/integer_mod_ring.py'

The code that makes this happen is sage/misc/cachefunc.pyx, line 621:

            if sourcelines is not None:
                from sage.env import SAGE_SRC, SAGE_LIB
                filename = sage_getfile(f)
                # The following is a heuristics to get
                # the file name of the cached function
                # or method
                if filename.startswith(SAGE_SRC):
                    filename = filename[len(SAGE_SRC):]
                elif filename.startswith(SAGE_LIB):
                    filename = filename[len(SAGE_LIB):]
                file_info = "File: %s (starting at line %d)\n"%(filename,sourcelines[1])
                doc = file_info+(f.func_doc or '')

As you can see, the code actively strips away the part that should really still be there. This change was made on #13432. Would it be very wrong to just do nothing with the filename here and simply report whatever gets reported on the original file?

Proposed doctest:

sage: M=IntegerModRing(7)
sage: from sage.misc.sageinspect import sage_getfile
sage: os.path.exists(sage_getfile(M.is_finite))
True
sage: os.path.exists(sage_getfile(M.is_field)) #this presently fails
True

CC: @simon-king-jena @ohanar

Component: misc

Author: Nils Bruin

Branch/Commit: 2ce7e4d

Reviewer: Simon King

Issue created by migration from https://trac.sagemath.org/ticket/15184

nbruin commented 10 years ago
comment:1

The logic above seems like a SAGE_ROOT-less translation of the code introduced on #11115, but I think something went wrong with a "leading slash" problem:

sage: sage.env.SAGE_LIB
'/usr/local/sage/5.10/local/lib/python2.7/site-packages'
sage: sage_getfile(M.is_field.f)
'/usr/local/sage/5.10/local/lib/python2.7/site-packages/sage/rings/finite_rings/integer_mod_ring.py'
sage: sage_getfile(M.is_field)
'/sage/rings/finite_rings/integer_mod_ring.py'
sage: sage.env.SAGE_LIB=sage.env.SAGE_LIB+'/'
sage: sage_getfile(M.is_field)
'/usr/local/sage/5.10/devel/sage/sage/rings/finite_rings/integer_mod_ring.py'

so it seems that if _sage_doc_ ensures it returns a relative filename, then sage_getfile will prefix the appropriate path, but if it returns an absolute path (which happens if we snap off 'SAGE_LIB' but not its slash separator) then of course it won't.

Are we sure that the connecting slash is never there? Then we can just do filename = filename[len(SAGE_LIB)+1:]. Otherwise, perhaps we should do

                if filename.startswith(SAGE_SRC):
                    filename = filename[len(SAGE_SRC):]
                    if filename.startswith('/'):
                        filename = filename[1:]
                elif filename.startswith(SAGE_LIB):
                    filename = filename[len(SAGE_LIB):]
                    if filename.startswith('/'):
                        filename = filename[1:]

By the way, is

            try:
                sourcelines = sage_getsourcelines(f)
            except IOError:
                sourcelines = None

really the best way of checking that the file reported by sage_getfile(f) exists? We could just do a

try:
    open(sage_getfile(f),'r')
    file_exists=True
except IOError:
    file_exists=False

which should be cheaper than actually rummaging through the entire file. The only thing we use later on is the starting line, and in many cases that would be available from elsewhere (e.g., the docstring), so we wouldn't need to actually read the source file.

simon-king-jena commented 10 years ago
comment:2

I think the current code is fragile, in particular if soft links are used (os.path.realpath might be needed at some point). os.path.exists might also come in handy.

The proposed fix expects the path separator "/", which should be os.path.pathsep.

So, there is enough reason to use os.path, rather than re-inventing its functionality in a non-portable way.

nbruin commented 10 years ago
comment:3

Replying to @simon-king-jena:

I think the current code is fragile, in particular if soft links are used (os.path.realpath might be needed at some point).

Do you think so? Shouldn't we just trust the paths as they are offered to us? Either $SAGE_ROOT has already been normalized, and then at this point any path that's coming from sage is already a "real" path, or $SAGE_ROOT is not normalized (and hence nothing derived from it either) and we shouldn't be normalizing either.

The proposed fix expects the path separator "/", which should be os.path.pathsep.

So, there is enough reason to use os.path, rather than re-inventing its functionality in a non-portable way.

I agree with that. However, I don't really see why we need to fondle the path anyway. We're getting the path that would be generated by sage_getfile on the original function. How can we improve on that?

simon-king-jena commented 10 years ago
comment:4

Replying to @nbruin:

So, there is enough reason to use os.path, rather than re-inventing its functionality in a non-portable way.

I agree with that. However, I don't really see why we need to fondle the path anyway. We're getting the path that would be generated by sage_getfile on the original function. How can we improve on that?

Python does not provide the file information in the doc string, but Cython does. And what would Cython do?

sage: CachedFunction.__doc__
"File: sage/misc/cachefunc.pyx (starting at line 452)\n\n
...

I guess we try to mimmick this behaviour. Note, however, that we mildly fail:

sage: P.<x,y> = QQ[]
sage: I = P*x   
sage: from sage.misc.sageinspect import _sage_getdoc_unformatted
sage: _sage_getdoc_unformatted(I.groebner_basis)
"File: /sage/rings/polynomial/multi_polynomial_ideal.py (starting at line 3507)
...

So, there is the additional "/" in the beginning of the file location. But in any case, Cython does not give the full path, and thus it should be a good idea that Sage doesn't, either.

By the way, something that I'm perhaps not to blame for should be changed, too: sage.misc.cachefunc.FileCache uses '/' as separator.

nbruin commented 10 years ago
comment:5

Something along these lines then? Note that this may be a good motivation to refactor sageinspect a little, so that getting "source file and line" is a little cheaper in many cases. Getting just a docstring should be fairly cheap (and usually should not involve reading source files), and here we're usually triggering a full fetch of the source code, only to throw it away. The reason is that sage_getsourcelines has a lot of logic that also applies to just fetching (file,lineno), but that logic is not available separately.

simon-king-jena commented 10 years ago
comment:6

Replying to @nbruin:

Something along these lines then?

Sounds good! And thank you that you did not attach a git branch. Hence, there is a chance that I will actually review it soon. Hopefully some people at Sage Days in Oxford will explain how to work with git. It currently seems plainly broken to me.

Note that this may be a good motivation to refactor sageinspect a little, so that getting "source file and line" is a little cheaper in many cases.

How?

Getting just a docstring should be fairly cheap (and usually should not involve reading source files),

Yes, the docstring itself would be easy. But in order to comply with Cython, it was decided to put file information into the doc string.

and here we're usually triggering a full fetch of the source code, only to throw it away.

What do you mean by "here"?

The reason is that sage_getsourcelines has a lot of logic that also applies to just fetching (file,lineno), but that logic is not available separately.

Is there code duplication? Or do you just suggest to make the code more modular?

simon-king-jena commented 10 years ago
comment:7

Aha, I see your comments in the patch. So, perhaps you are right, it would be a good idea to have a separate function dedicated to getting file and line number.

simon-king-jena commented 10 years ago
comment:8

With the patch, we get

sage: P.<x,y> = QQ[]
sage: I = P*x 
sage: from sage.misc.sageinspect import _sage_getdoc_unformatted
sage: _sage_getdoc_unformatted(I.groebner_basis)
"File: /mnt/local/king/SAGE/prereleases/sage-5.12.beta5/local/lib/python2.7/site-packages/sage/rings/polynomial/multi_polynomial_ideal.py (starting at line 3533)

and I think this is not what we want. I believe we should be as close to Cython's behaviour as possible, hence, it should be

File: sage/rings/polynomial/multi_polynomial_ideal.py

Currently, it is File: /sage/rings/polynomial/multi_polynomial_ideal.py, and I think according to what Cython does we should get rid of the leading slash.

nbruin commented 10 years ago

Attachment: trac_15184.patch.gz

Updated, with doctest

nbruin commented 10 years ago
comment:9

Some stupid typos in the old version. Sorry about that. New version should be better. Incidentally, to see that presently things are a bit on the slow side:

sage -t cachefunc.pyx
    [596 tests, 28.50 s]

and that's for testing a module that doesn't implement any significant computation! I suspect that it's because of extensive testing of incredibly slow introspection tools.

nbruin commented 10 years ago
comment:10

I guess it doesn't hurt to have the path normalization in, but it might not be necessary for the common prefix checking, at least not at far as trailing slashes are concerned:

sage: P="/a/b/c/d/e/f"
sage: Q="/a/b/c/"
sage: os.path.commonprefix([P,Q])
'/a/b/c/'
sage: Qn=os.path.normpath(Q)
sage: Qn
'/a/b/c'
sage: os.path.commonprefix([P,Qn])
'/a/b/c'
sage: os.path.relpath(P,Q)
'd/e/f'
sage: os.path.relpath(P,Qn)
'd/e/f'
sage: os.path.commonprefix([P,Q]) == Q
True
sage: os.path.commonprefix([P,Qn]) == Qn
True

Of course we do have:

sage: Q="/a/b/../b/c/"
sage: os.path.commonprefix([P,Q])
'/a/b/'
sage: os.path.commonprefix([P,Q]) == Q
False
sage: Qn=os.path.normpath(Q)
sage: Qn
'/a/b/c'
sage: os.path.commonprefix([P,Qn]) == Qn
True

so we should make sure that SAGE_LIB and SAGE_SRC are normalized at some point.

I don't think they need to be made "real path"s. If people have ln -s /usr/local /usr/global and they refer to files via both paths, they deserve that the system doesn't recognize them as identical. Of course, if sage decides to "real path" something, then it should do that, (probably SAGE_ROOT), once, and derive all its paths from that. Certainly at this point we should just trust what we get. The whole "return relative paths" is cosmetic anyway and in most of the cases the next step will be to recognize that the path is relative and put the original prefix back.

nbruin commented 9 years ago

Branch: u/nbruin/ticket/15184

nbruin commented 9 years ago
comment:15

PING I got bitten by this issue once again, so I adapted the patch here as a branch. Can we review and incorporate this change?


New commits:

41318d7trac #15184: fix path insertion in docstring generation for cached functions.
nbruin commented 9 years ago

Commit: 41318d7

simon-king-jena commented 9 years ago
comment:16

You comment

+ #this is a rather expensive way of getting the line number, because
+ #retrieving the source requires reading the source file and in many
+ #cases this is not required (in cython it's embedded in the docstring,
+ #on code objects you'll find it in co_filename and co_firstlineno)
+ #however, this hasn't been factored out yet in sageinspect
+ #and the logic in sage_getsourcelines is rather intricate.

I didn't check the code yet, but I think I remember that this was improved. But if I am mistaken, we could fix it here, right?

simon-king-jena commented 9 years ago
comment:17

Moreover, you will notice that you call sage_getsourclines twice. So, instead of doing

+ _, lineno = sage_getsourcelines(f)
+ file_info = "File: %s (starting at line %d)\n"%(filename,lineno)

you should do

+ file_info = "File: %s (starting at line %d)\n"%(filename,sourcelines[1])
simon-king-jena commented 9 years ago
comment:18

I just pushed an additional commit, hopefully improving the code a little bit. Strange that the commit cannot be seen yet. git trac push told me

Pushing to Trac #15184...
Guessed remote branch: u/SimonKing/ticket/15184
Enter passphrase for key '/home/king/.ssh/id_rsa': 

To git@trac.sagemath.org:sage.git
 * [new branch]      HEAD -> u/SimonKing/ticket/15184

Changing the trac "Branch:" field...

but apparently it did not change the branch field.

simon-king-jena commented 9 years ago

Changed commit from 41318d7 to 3476658

simon-king-jena commented 9 years ago

Changed branch from u/nbruin/ticket/15184 to u/SimonKing/ticket/15184

simon-king-jena commented 9 years ago
comment:19

Doing the change manually...

I think it is a clear improvement. I am waiting for the test suite, to be on the safe side.

Do you agree with my review commit? Do you think the co_filename and co_firstlineno thingy should be taken care of here, or elsewhere?

simon-king-jena commented 9 years ago

Reviewer: Simon King

nbruin commented 9 years ago
comment:20

Replying to @simon-king-jena:

Do you agree with my review commit?

Yes

Do you think the co_filename and co_firstlineno thingy should be taken care of here, or elsewhere?

Yes

(positive review is OK with me)

simon-king-jena commented 9 years ago
comment:21

All tests pass for me, you accept my review patch. So, I'd say it is a positive review.

Please check if I spelled your name correctly. And Volker will hopefully be able to explain why the patchbot reports a failure...

simon-king-jena commented 9 years ago

Author: Nils Bruin

tscrim commented 9 years ago
comment:22

For the buildbot failure, from looking at the log there are tests which fail in

sage -t src/sage/modular/modform/numerical.py  # 2 doctests failed
sage -t src/sage/rings/polynomial/polynomial_element.pyx  # 2 doctests failed
sage -t src/sage/rings/real_double.pyx  # 2 doctests failed

due to the results falling outside of the tolerance (so should not be affected by this ticket, plus I only use the buildbot/patchbot as possible failures, i.e. it can give false negatives).

Also please forgive my nitpicking, but would you be willing to change

        test that #15184 is fixed::

into

        Test that :trac:`15184` is fixed::
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

2ce7e4dUse the :trac: directive
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 3476658 to 2ce7e4d

simon-king-jena commented 9 years ago
comment:24

Replying to @tscrim:

Also please forgive my nitpicking, but would you be willing to change

        test that #15184 is fixed::

into

        Test that :trac:`15184` is fixed::

Done.

tscrim commented 9 years ago
comment:25

Thanks!

vbraun commented 9 years ago

Changed branch from u/SimonKing/ticket/15184 to 2ce7e4d