Profiling support and coverage support

Do you know roughly how much test coverage ble.sh has? That is, is it closer to 10%, 50%, or 80%?

There are (almost) no tests for ble.sh. My feeling is that only 5-10% is covered by the tests for the vim mode but I'm not sure actually.

I imagine you spend a lot of time debugging so I'd be interested in feedback (or patches) on devtools features.

The most time consuming part is to find the way to reproduce the buggy behavior. When I encounter some strange behavior in interactive session, sometimes the behavior cannot be easily reproduced. But I don't know what is the good way to deal with such problems.

The second time consuming part is to identify the bottleneck. Shell scripts are generally slow so it is very important to find the bottleneck for optimization of the scripts. I want the feature of profiling, i.e., measure the time of each function. I sometimes try to measure the time by embedding some codes in shell scripts, but the measuring logic itself consumes the time so that it is hard to measure the actual time. We need some native mechanism (i.e., not shell scripts) for profiling.

Another annoying thing in debugging interactive sessions is that the error messages from Bash and the output of ble.sh are both output to the terminal, so they overwrite each other. So I need to redirect the error messages or debugging messages to a file or a different TTY.

Also if one can set some breaking points, it will be useful for debugging the scripts.

Edit: Also, a native support for coverage measurement is useful. For example, coverage analysis using the output of set -x is tricky and not so reliable.

Originally posted by @akinomyoga in https://github.com/oilshell/oil/issues/653#issuecomment-609461869

Profiling makes sense once we have C++
Coverage is related to #583 . I haven't thought in detail about how to do it, but it should definitely be possible since we have good line number information, etc. I guess we need a fast data structure that supports marking all the ASTs?
- We can probably even do it on a token/column basis? i.e. coverage for echo ${x:-$(echo exected?}

The most time consuming part is to find the way to reproduce the buggy behavior. When I encounter some strange behavior in interactive session, sometimes the behavior cannot be easily reproduced. But I don't know what is the good way to deal with such problems.

@akinomyoga I have heard good things about

https://rr-project.org/

for C++. (I am using a mix of gdb and Eclipse now and not that happy with it, so I might experiment with different tools. It will become more important as more of Oil is in C++)

In theory, recording the execution history for reversible debugging is possible with Oil since the state space of a shell is very small (e.g. everything is in core/state.py and core/runtime.asdl). And also the metaprogramming aspect of Oil lets may let us create multiple interpreters, e.g. with different tracing or GC policies, similar to PyPy. (Or at least we can experiment with that technique.)

But probably won't happen for a long time... we have to do many more basic things first :)

But I suppose the idea is that you could run with it on all the time if it's cheap enough.

I think the more conventional way of catching those errors is through unit tests. That is, I try not to create too many code paths in OSH that can't be reached by automated tests...

That reminds me of #439, I want to use some kind of terminal testing, not just stdin byte streams. We probably can use similar techniques for both ble.sh and Oil, so if you have any ideas let me know.

Is it possible to "tee" the terminal and then replay it into ble.sh? I played with ttyrec a little but not much.

@akinomyoga I have heard good things about

https://rr-project.org/

for C++. (I am using a mix of gdb and Eclipse now and not that happy with it, so I might experiment with different tools. It will become more important as more of Oil is in C++)

Thank you for the information! I think it will be very useful if a similar thing is available for shells.

In theory, recording the execution history for reversible debugging is possible with Oil since the state space of a shell is very small (e.g. everything is in core/state.py and core/runtime.asdl).

Unfortunately, ble.sh defines so many shell variables and arrays which are not so compact. In particular, the command history is large data. To manipulate the history entries, ble.sh loads the entire history into shell arrays. Some strange behavior of ble.sh had occurred when one moves between the history entries. I think one can record the change history of the shell variables (along with the call history of functions) rather than recording entire states every time.

But I suppose the idea is that you could run with it on all the time if it's cheap enough.

Yes. I agree. Strange behavior that cannot be reproduced easily rarely occurs. It is something like only once or twice per year in the daily use of shells. So it is important that it's cheap because I don't want to make it enabled all the time if it is slow.

I think the more conventional way of catching those errors is through unit tests.

Yes. I agree. The reason that ble.sh doesn't have so many tests is partly because it is difficult to test the interactive behavior including the rendering of the terminal, but the major reason is just because I'm lazy. Even if the tests of the whole interactive behavior (the integration test) is difficult, it is still possible to do unit tests on each function that is not directly tied to interactivity.

But I think this is a good chance to supplement unit tests of ble.sh! I added tests in lib/test-util.sh for unit tests for the functions defined in src/util.sh. Now I have 435 tests in lib/test-util.sh, but it is still about 16% of the functions in src/util.sh.

That reminds me of #439, I want to use some kind of terminal testing, not just stdin byte streams. We probably can use similar techniques for both ble.sh and Oil, so if you have any ideas let me know.

Yes, that is also a difficult point. For the test, the user input stream can be provided, but the output terminal sequences can vary depending on TERM and also can be completely different between ble.sh versions. There is an ambiguity of terminal sequences that reproduces the same behavior. And, also the "color" of the highlighted texts can be changed through versions. If we want to really test the terminal contents, we need to run a terminal emulator and inspect its buffer after processing the terminal sequences.

Is it possible to "tee" the terminal and then replay it into ble.sh?

Hm, what does "replay into ble.sh" mean? ble.sh has a very simple terminal sequence parser (which is used to analyze PS1), but I think it is not so useful for testing the terminal sequences of interactive UIs. I think I can implement (a part of) a simple terminal emulator needed for tests by updating the terminal sequence analyzer of ble.sh, but I'm not sure if it is really a good idea to implement such a partial terminal emulator in shell scripts.

I played with ttyrec a little but not much.

ttyrec just records the terminal sequences and its timing. Actually, I created the animated GIF of ble-0.1 on the ble.sh Wiki page by using ttyrec and seq2gif.

I'm not sure why, but out of the blue just now I thought native coverage support might be an interesting lever for both OSH and Oil, especially since the mechanisms for backfilling it are so clumsy/hackish.

If project-x has a few bits of bash that OSH chokes on, native support for things like coverage and profiling can provide the clear value proposition for why it's worth their time to update them for OSH-compatibility.

Even if their main target is still bash, making a few small changes and adding OSH to the shells their test suite runs in sounds like a nice trade for something like high-quality granular coverage reports.

oils-for-unix / oils

Profiling support and coverage support #687