jerryscript-project / jerryscript

Ultra-lightweight JavaScript engine for the Internet of Things.
https://jerryscript.net
Apache License 2.0
6.96k stars 674 forks source link

Debugger support for snapshots #1660

Open martijnthe opened 7 years ago

martijnthe commented 7 years ago

I would like to be able to use the debugger with code loaded from snapshots. Because of device limitations, it may not be possible to parse the source on the device itself. The current debugging approach assumes that the device itself is capable of parsing the JS source.

To support debugging of snapshots, I think we need:

  1. The "offline" JerryScript parser should be able to serialize the collected debug info to a separate file, which I'll refer to as the "sourcemap" (would the commonly used sourcemap format work for this?)
  2. The debugger should be able to load debug info "out of bounds" by deserializing the sourcemap file (this would also pave the way for on-the-fly attachment of the debugger, I think)
  3. The "disabled breakpoint" instructions should be emitted into the snapshot too (not sure if this is the case already)
  4. Optional: "tag" the snapshot data with the filename of the sourcemap matching the snapshot. This way, the debugger can attempt to find the sourcemap file automatically (similar to the sourcemap comment that browser's use).
zherczeg commented 7 years ago

Yes these features definitely needed. To make things simple probably the debugger itself should serialize the data rather than adding new code paths. Perhaps we could create a serializer debugger client.

I think the most difficult aspect will be losing the byte code execution from ROM since enabling a breakpoint requires writable memory.

jiangzidong commented 7 years ago

@zherczeg To make things simple probably the debugger itself should serialize the data rather than adding new code paths Sorry I don't understand what is "add new code path" here.

zherczeg commented 7 years ago

The current debugger design is sending everything to the client through a connection and the client organizes the data. If we would want to serialize data into a file we need a separate implementation since the file obviously need some kind of structure (format). Simply dumping data would probably cause issues. If we would have a serializer client (a python script) it could organize the data, and write it into a file.

zherczeg commented 7 years ago

Consider the following source code:

function f()
{
  function g() { return 1; }
  function g() { return 2; }
  function g() { return 3; }
  function g() { return 4; }
  return g(); // result is 4
}

Currently the debugger (parser) sends all g functions to the client, and also sends byte-code-free for the first three g functions, since they are unused and the memory can be freed. Parser is not optimized for supporting such a badly written code and detecting unused functions early.

In case of a file output we probably would need to construct a tree in memory with nodes, and serialize this tree after the parsing is done. This would require a lot of new code (managing the tree, inserting/ deleting nodes, etc.).

jiangzidong commented 7 years ago

@zherczeg I see, we didn't change the code of debugger, but let the a specific client to generate the serialized debug_info. Am I right?

martijnthe commented 7 years ago

If we would have a serializer client (a python script) it could organize the data, and write it into a file.

I don't really like the idea of having a separate client/python script for this. Separating the parsing and gathering of debug info makes things complicated to use and build upon. Also, I don't like the idea of having additional dependency to python to be able to generate debug info for snapshots.

For context, a story from the past: at Pebble, to generate snapshots, we had used Emscripten to cross-compile a version of the JerryScript CLI to a stand-alone JavaScript program. This was very useful because this made this "snapshot compiler" self-contained and more or less platform-agnostic. Therefore is was very easy to run the "snapshot compiler" in a variety of environments (Node.js, browser, iOS JavaScriptCore, etc...).

Adding a separate python script only to generate the debugging info would block the use case I just described.

Idea: a serialization "client" could also be written in C and have alternative implementations of the jerry-debugger-ws.h interfaces. This alternative implementation would do the gathering, organizing of the data and finally the serialization. (Renaming the interfaces and perhaps adding a thin abstraction layer there would probably make sense at that point.)

Re. redefining functions: nice edge case indeed.

zherczeg commented 7 years ago

@martijnthe yes, that is a good idea. A special debugger server port could do this which process the data rather than transmitting it.

martijnthe commented 7 years ago

I think the most difficult aspect will be losing the byte code execution from ROM since enabling a breakpoint requires writable memory.

I understand that the current implementation requires this, because enabling a breakpoint is implemented by flipping a "disabled breakpoint" opcode to an "enabled breakpoint" opcode.

That said, this could be implemented differently, no? I can imagine an implementation where the bytecode is read-only, a list of breakpoints info is kept in RAM and the VM compares the program counter against that list.

Stepping line-by-line is probably a bit more involved, but not impossible to do. For comparison, I know that gdb has an implementation for "step line" where, under the hood, the gdb client will repeatedly send "step instruction" to the target, until the program counter matches what the client thinks is the program counter at the beginning of the next line.

zherczeg commented 7 years ago

Hm, yes maintaining a list of virtually enabled breakpoints could be possible. The number of such enabled breakpoints would be obviously limited.

Waiting for a network round trip after each byte code is very slow. Also debugger should not slow down the execution when it is part of the binary but not enabled at runtime.

martijnthe commented 7 years ago

Waiting for a network round trip after each byte code is very slow.

This would only happen for "step line". But sure, I mentioned it just to illustrate there are existing solutions that have proven to be usable in practice.

Also debugger should not slow down the execution when it is part of the binary but not enabled at runtime.

Agree. Disabling the VM's debugger capabilities can be done in the same way that it is done right now. I don't think the idea of a list of breakpoints implies a change to that.

zherczeg commented 7 years ago

One more thing: when debugging is enabled, certain optimizations are disabled (memory consumption is bigger). Hence ideally there should be two byte code instances one with debugging support and one without it.

martijnthe commented 7 years ago

@zherczeg what are your plans w.r.t. to these requests? Are you working on any of this? If not, @jiangzidong / me will start working on it soon.

zherczeg commented 7 years ago

You can work on this.

martijnthe commented 7 years ago

cc @HBehrens can you share your thoughts about pros/cons using the sourcemap format?

jiangzidong commented 7 years ago

@zherczeg

I think the most difficult aspect will be losing the byte code execution from ROM since enabling a breakpoint requires writable memory.

The current implementation of snapshot_load_compiled_code (called by jerry_exec_snapshot) will copy the snapshot in ROM into jerry heap, so the bytecode is still writable in vm. Am I right?

zherczeg commented 7 years ago

No. If you don't pass the copy flag, it will only copy the header, and creates a special byte code, which points to the start of the actual byte code in ROM. The key feature of snapshot is that the byte code itself is running from ROM.

jiangzidong commented 7 years ago

Oh, sorry that I missed that.

jerry-snapshot.c L394 memcpy (instructions_p + code_size + 1, &real_bytecode_p, sizeof (uint8_t *)); vm.c L678 memcpy (&byte_code_p, byte_code_p + 1, sizeof (uint8_t *));

Thanks for your explanation.

martijnthe commented 7 years ago

@jiangzidong and me just had a quick chat to identify pieces of work to enable debugging snapshots (most likely not the complete list...;) ):

  1. Dump to file: implement a new jerry_debugger_send in jerry-debugger-dump. This alternative implementation will write the messages to a local file instead of sending it to a client. The format of this file will be closely follow the protocol messages that are sent during parsing. As a work-around for bytecode needing to be in writable memory in order for it to be debugged, we'll pass copy_bytecode=true to jerry_exec_snapshot() before running. See point 5 for the actual fix.

  2. Add a way to load the debug info from a dump file into the debugger client. Q: should this file loading capability be added to the client(s)? I'm leaning to "yes", but having 2 clients is 2x the work... Thoughts on reducing this duplication? @zherczeg @polaroi8d? I wonder if it makes sense to remove the Python one, extract the .js out of the HTML into a separate file and add a Node.js based CLI client (instead of the Python one) that uses the same .js that the HTML uses.

  3. Add a version to the debugging protocol (and file format), so that we can at least detect a version mismatch and tell the user of debugger to use a debugger client that supports version XYZ instead. Later we'll worry about backward compatibility of debugger clients for older JerryScript debug protocol versions. I think it's an important thing to tackle, but probably it's better to address this later when direction of the protocol is more clear.

  4. Mapping: need to move away from using physical addresses of bytecode in the debugging information. This does not work when the physical addresses are not known when the debug info is generated. Solution direction: TBD.

  5. Breakpoints requiring bytecode is writable memory. Solution: maintain linked list of enabled breakpoints. When VM encounters disabled breakpoint instruction, check if it is in the list of enable breakpoints.

  6. Longer term: bridge to Chrome Debugger Protocol. Existing IDEs and tooling support connecting to debug servers that talk this protocol, so I think it makes sense to support it too.