whonore / Coqtail

Interactive Coq Proofs in Vim
MIT License
272 stars 35 forks source link

Seeing Coqtop error messages #106

Closed j-hui closed 4 years ago

j-hui commented 4 years ago

When I run :CoqStart, sometimes the underlying coqtop instance will crash, and in my :messages buffer it will "Failed to launch Coq." It it possible to read the error message to see why it failed?

j-hui commented 4 years ago

for what it's worth, :CoqToggleDebug doesn't work because it seems to rely on :CoqStart, and thus fails when I invoke that command.. (I'm also not totally sure what it's used for)

whonore commented 4 years ago

I've also observed that but it's relatively uncommon for me and I haven't found a consistent way of reproducing it so it's been difficult to debug. If it were actually an error with coqtop then it should print more information after the "Failed to launch" message, but since it doesn't I think it may actually be an error instead with the channel interface between Vim and Python. Maybe the channel isn't finished initializing or there's some failure that I'm not catching before trying to write to it.

:CoqToggleDebug probably wouldn't help because it just logs the XML that's sent between Coqtail and coqtop, but it shouldn't really depend on :CoqStart so that's worth fixing too.

Could you let me know what version of Vim you're using? And if you find a consistent way of triggering the error that would also be very helpful.

j-hui commented 4 years ago
$ vim --version
VIM - Vi IMproved 8.0 (2016 Sep 12, compiled Mar 18 2020 18:29:15)
Included patches: 1-1453
Modified by pkg-vim-maintainers@lists.alioth.debian.org
Compiled by pkg-vim-maintainers@lists.alioth.debian.org
Huge version without GUI.  Features included (+) or not (-):
+acl               +farsi             +mouse_sgr         -tag_any_white
+arabic            +file_in_path      -mouse_sysmouse    -tcl
+autocmd           +find_in_path      +mouse_urxvt       +termguicolors
-autoservername    +float             +mouse_xterm       +terminal
-balloon_eval      +folding           +multi_byte        +terminfo
+balloon_eval_term -footer            +multi_lang        +termresponse
-browse            +fork()            -mzscheme          +textobjects
++builtin_terms    +gettext           +netbeans_intg     +timers
+byte_offset       -hangul_input      +num64             +title
+channel           +iconv             +packages          -toolbar
+cindent           +insert_expand     +path_extra        +user_commands
-clientserver      +job               -perl              +vertsplit
-clipboard         +jumplist          +persistent_undo   +virtualedit
+cmdline_compl     +keymap            +postscript        +visual
+cmdline_hist      +lambda            +printer           +visualextra
+cmdline_info      +langmap           +profile           +viminfo
+comments          +libcall           -python            +vreplace
+conceal           +linebreak         +python3           +wildignore
+cryptv            +lispindent        +quickfix          +wildmenu
+cscope            +listcmds          +reltime           +windows
+cursorbind        +localmap          +rightleft         +writebackup
+cursorshape       -lua               -ruby              -X11
+dialog_con        +menu              +scrollbind        -xfontset
+diff              +mksession         +signs             -xim
+digraphs          +modify_fname      +smartindent       -xpm
-dnd               +mouse             +startuptime       -xsmp
-ebcdic            -mouseshape        +statusline        -xterm_clipboard
+emacs_tags        +mouse_dec         -sun_workshop      -xterm_save
+eval              +mouse_gpm         +syntax
+ex_extra          -mouse_jsbterm     +tag_binary
+extra_search      +mouse_netterm     +tag_old_static
etc..
j-hui commented 4 years ago

To be honest I have no clue why this breaks. It seems to be an issue with specific pathnames---renaming a problematic file seems to fix the problem, and renaming a file to the problematic name breaks it.

And the wild thing is that this seems to work with Neovim... (though of course I encounter the other issues associated with using Neovim).

$ # faked example output
$ ls
Bad.v Good.v
$ vim Bad.v # doesn't work
$ nvim Bad.v # works
$ mv Bad.v Bad2.v; vim Bad2.v # works
$ vim Good.v # works
$ mv Good.v Bad.v; vim Bad.v # doesn't work
whonore commented 4 years ago

That's odd, are the problematic names consistently problematic (do you reliably encounter the error if you use that name)? And if you try :CoqStart again after it fails without quitting Vim does it continue to fail or does it work the second time? It would also be interesting to see if Neovim continues to work when using the PR that's supposed to add support for it (#103).

j-hui commented 4 years ago

It continues to fail, with pretty much the same behavior.

I've managed to isolate the behavior to here, where the issue appears to be a type mismatch (l:res appears to be of type String---an empty string in particular---while we expect it to be a Dictionary)

I'm still tracing through the Python side to see where the issue is coming from..

whonore commented 4 years ago

The relevant Python function is probably handle in coqtail.py. What is the string that’s returned?

j-hui commented 4 years ago

Invoking the handler from the handle() method returns None. However, this is consistent with what happens for .v files that work too.

For whatever reason, that turns into an empty string on in coqtail.vim for the problematic .v files, but {'buf': 1, 'ret': v:null} for the ones that work.

By the way, is there a good way to :echom dictionaries in Vimscript? I can only do this with strings (I'm using a combination of :sleep and :echo for dictionaries...)

whonore commented 4 years ago

The handler (start in this case) returns None when there are no errors. This gets bundled in a dictionary, converted to JSON and written to the socket with wfile.write, which then gets received by Vim and decoded back into a dictionary. So the only way I can see the Vim side getting a string instead is if something goes wrong with ch_evalexpr and it returns an empty string by default or something like that.

You can use string() to convert dictionaries into strings that can be :echomed.

whonore commented 4 years ago

Yea, I just checked :h ch_evalexpr and it says "When there is an error or timeout it returns an empty string". There shouldn't be a timeout set so there must be something else going wrong with the channel.

whonore commented 4 years ago

Vim stores the last error message in the special v:errmsg variable so you could try checking that after it fails.

j-hui commented 4 years ago

v:errmsg tells me: E475: Invalid argument:. But weirdly, I seem to get this on the subsequent, apparently successful call that appears to be telling Coqtop to stop. And I also get this for the files for which Coqtail works.

I realized I should have mentioned this earlier---it does actually feel like something is timing out somewhere. Vim blocks for a second or two before telling me that it failed to launch Coq. I'll investigate more tomorrow where this might be coming from.

whonore commented 4 years ago

v:errmsg isn't automatically reset so you're probably just seeing the same message from the initial failure. You can clear it with let v:errmsg = "". You may also be right about the timeout, according to :h channel-timeout the default is 2 seconds. If you change this line to

let s:chanopts = {'mode': 'json', 'timeout': 10000}

does that fix the problem?

If not, try ch_logfile('channel.log', 'w') before calling :CoqStart to get a better view of what's going on.

j-hui commented 4 years ago

So it certainly still is timing out, even with the 10s timeout. Perhaps the error message reported by Coqtail can be modified to reflect this? (If so I can issue a PR for this.)

The logs show something interesting (I've obfuscated the path names because project I'm working on is still closed source for now and under a double-blind review). There is another job running, from vim-signify, that is sending a diff somewhere, before spamming some callback:

==== start log session ====
  0.101875 : Starting job: sh  -c  git diff --no-color --no-ext-diff -U0 -- 'Proof.v'
  0.101944 on 6: Created channel
  0.103184 : looking for messages on channels
  0.112255 RECV on 6: 'diff --git a/Proof.v b/Proof.v
index 2682e63..f2d9676 100644

(a git diff)

  0.112377 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112439 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112459 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112474 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112487 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112504 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112517 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112529 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112542 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112554 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112570 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112584 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112600 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112614 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112630 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112650 on 6: Invoking channel callback <SNR>82_callback_vim_stdout
  0.112664 on 6: Invoking channel callback <SNR>82_callback_vim_stdout

(keeps on spamming this)

  0.117207 on 6: Job ended
  0.205176 : looking for messages on channels
  0.206410 on 6: Freeing job
  0.206430 on 6: Closing channel
  0.206437 on 6: Clearing channel
  0.206444 on 6: Freeing channel
  0.206476 : looking for messages on channels
  1.606914 : Starting job: sh  -c  git diff --no-color --no-ext-diff -U0 -- 'Proof.v'
  1.606959 on 7: Created channel
  1.608229 : looking for messages on channels
  1.617211 RECV on 7: 'diff --git a/Proof.v b/Proof.v

(the cycle continues)

And this keeps going, with incrementing channel numbers (which I think is 4 in this case?).

Later, we see a channel being created from me running :CoqStart:

 33.386488 on 15: Created channel
 33.386603 on 15: Connecting to localhost port 41103
 33.386720 on 15: Waiting for connection (waiting 1 msec)...
 33.386738 on 15: Connection made

(another block of Signify's bs)

 33.446299 SEND on 15: '[1,[1,"start",{"version":"8.4pl6","args":["-R","/some/path","SomeName"],"coq_path":"","opts":{"timeout":0,"filename":"/some/path/Proof.v","encoding":"utf-8"}}]]
'
 33.446341 on 15: Reading JSON

(more Signify spam)

 33.453144 RECV on 15: '[1, {"buf": 1, "ret": null}]'
 33.465738 on 15: Waiting for up to 10000 msec
 43.475272 on 15: Timed out
 43.477214 SEND on 15: '[2,[1,"stop",{"opts":{"timeout":0,"filename":"/some/path/Proof.v","encoding":"utf-8"}}]]
'
 43.477289 on 15: Reading JSON
 43.477298 : looking for messages on channels
 43.477306 on 15: Getting JSON message 1
 43.477313 on 15: Dropping message 1 without callback
 43.477322 : looking for messages on channels
 43.477328 on 15: Waiting for up to 10000 msec
 43.480933 RECV on 15: '[2, {"buf": 1, "ret": null}]'
 43.480955 on 15: Getting JSON message 2
 43.491920 : ERROR: E216: No such group or event: coqtail#Autocmds * <buffer>
 43.491956 on 15: Closing channel
 43.492009 on 15: Clearing channel
 43.511043 : looking for messages on channels
 43.511052 on 15: Closing channel because all readable fds are closed
 43.511054 on 15: Closing channel
 43.511057 on 15: Closing channel
 43.511059 on 15: Clearing channel
 43.511062 on 15: Freeing channel

The log doesn't seem to indicate Signify doing anything with Coqtail's channel (15)---it just skips from 14 to 16. But I'm wondering if Coqtail's output is still somehow being sent to one of Signify's channels (or vice versa)? Not sure whose side the problem is on. In any case, disabling Signify seems to fix my issue..

whonore commented 4 years ago

That's interesting, thanks for looking into this. I'll have to try installing Signify and see if I can reproduce it. Have you noticed any pattern in the "bad" file names? Or if they have some common git status (all with unstaged changes, no changes, something like that)?

j-hui commented 4 years ago

Yeah, they all seem to have unstaged changes! But that alone doesn't seem to be sufficient to reproduce the issue: simple additions, deletions, and modifications don't appear to cause issues. But if I make some really big changes (e.g. adding a bunch of newlines near the top of the file, then indenting the bottom half of the file), the problem appears. And the problem is fixed when I stage the changes. I haven't been able to narrow it down to any specific kinds of changes but I think it has something to do with the size of the diff.

j-hui commented 4 years ago

Ok I've narrowed it down by simply spamming empty lines at the top of my proof. Adding up to 117 blank lines is ok, but adding the 118th causes :CoqStart to fail. If I add content to those blank lines (e.g. (*.*)), my quota appears to decrease (to 114), proportional to the amount of content (i.e. (*..*) lowers my blank line threshold more than (*.*)). My guess is that the issue is somehow related the size of the diff being sent along the channel..

whonore commented 4 years ago

Cool, I'm able to reproduce it now. :CoqStart fails on a tracked file with a huge diff but succeeds on an untracked file with the same content. I'll try to dig into the log some more and see what's going on.

whonore commented 4 years ago

I haven't completely confirmed this but it seems like it's maybe a Vim bug where it gets overwhelmed with messages and never parses the response from ch_evalexpr even though it receives the message. I made a branch that works around it by reading the pending messages with ch_read immediately that after seems to work, but it's kind of hacky. I'm not sure this is how I ultimately want to fix this but could you check if it does indeed work for you?

j-hui commented 4 years ago

Thanks whorore! I can confirm that your workaround works, but I also agree that it's probably not the best solution. In particular, there's still a noticeable lag at :CoqStart, versus nearly instantaneous init time for non-problematic files. Then again, at the very least, this workaround doesn't seem to affect working cases and salvages broken cases.

For now, I'll probably stick to just disabling Signify since I can just see my changes using Fugitive. Let me know if there's anything I can help test or investigate.

whonore commented 4 years ago

Since this seems to be a pretty uncommon edge case and possibly due to a Vim bug, I'll close it as wontfix for now. If more people run into this problem then I'll revisit it.