Open sparr opened 1 year ago
Currently, the minification steps modify both the node tree and tokens array. I'm not too happy about this either, and will probably improve this as part of a branch I'm (potentially) working on.
Either way, you can traverse over the tokens in a node tree via traverse_tokens and add them to your own array, then count that.
I'm not sure how easy it is to split the code and minify each function separately - especially if you want to do renaming, as that's an inherently global operation. It might be better to minify the whole thing at once.
Currently, minify_code converts a node tree back to source at the end - using two approaches depending on whether minify_wspace was requested. Splitting this logic into a separate function would definitely be nice. (I may also do this in my branch)
However, it sounds like you don't want to output minified code, just lightly modified code. The logic to do so would likely be somewhat similar (in structure) to the bottom else case of minify_code (minus the complications) - traverse over tokens, append their contents and the wspace between them, except with changes to the comments as needed. (The comments are part of the wspace between tokens and aren't captured explicitly in the node tree or the token list)
Below are a simple cart I'm testing with and a jsonpickle
d representation of root
. I can see the token that represents the function
keyword, but I can't see any obvious way to capture or manipulate the whole function definition (from function
to end
). It feels like I'm missing some layer of the puzzle here.
pico-8 cartridge // http://www.pico-8.com
version 18
__lua__
-- comment
function funcname()
return true
end
{
"children": [ {
"body": { "py/id": 25 },
"children": [
{
"children": { "py/tuple": [] },
"endidx": 19,
"idx": 11,
"modified": false,
"parent": { "py/id": 2 },
"py/object": "pico_tokenize.Token",
"source": {
"cart": {
"code": "-- comment\nfunction funcname()\n return true\nend",
"code_map": [ {
"py/newargs": { "py/tuple": [ { "py/tuple": [ 0, "test.p8", "-- comment\nfunction funcname()\n return true\nend", 0, 3 ] } ] },
"py/object": "pico_cart.CodeMapping",
"py/seq": [ 0, "test.p8", "-- comment\nfunction funcname()\n return true\nend", 0, 3 ]
} ],
"code_rom": null,
"meta": { "default_factory": { "py/type": "builtins.list" }, "py/object": "collections.defaultdict" },
"name": "test.p8",
"platform": "l",
"py/object": "pico_cart.Cart",
"rom": { "py/object": "pico_defs.Memory" },
"screenshot": null,
"version_id": 18,
"version_tuple": { "py/tuple": [ 0, 1, 23, 0 ] }
},
"mappings": { "py/id": 9 },
"py/object": "pico_process.CartSource"
},
"type": { "py/object": "pico_tokenize.TokenType", "value": 3 },
"value": "function",
"vline": 1
}, {
"children": [ {
"children": { "py/tuple": [] },
"endidx": 28,
"idx": 20,
"modified": false,
"parent": { "py/id": 12 },
"py/object": "pico_tokenize.Token",
"source": { "py/id": 6 },
"type": { "py/object": "pico_tokenize.TokenType", "value": 2 },
"value": "funcname",
"vline": 1
} ],
"endidx": 28,
"idx": 20,
"kind": { "py/object": "pico_parse.VarKind", "value": 1 },
"name": "funcname",
"new": false,
"parent": { "py/id": 2 },
"parent_scope": {
"depth": -1,
"funcdepth": 0,
"items": { "_ENV": { "implicit": true, "keys_kind": null, "name": "_ENV", "py/object": "pico_parse.Local", "scope": { "py/id": 19 } } },
"parent": null,
"py/object": "pico_parse.Scope"
},
"py/object": "pico_parse.Node",
"scopespec": null,
"source": { "py/id": 6 },
"type": { "py/object": "pico_parse.NodeType", "value": 0 },
"value": null,
"var": { "keys_kind": null, "name": "funcname", "py/object": "pico_parse.Global" },
"var_kind": null
}, {
"children": { "py/tuple": [] },
"endidx": 29,
"idx": 28,
"modified": false,
"parent": { "py/id": 2 },
"py/object": "pico_tokenize.Token",
"source": { "py/id": 6 },
"type": { "py/object": "pico_tokenize.TokenType", "value": 4 },
"value": "(",
"vline": 1
}, {
"children": { "py/tuple": [] },
"endidx": 30,
"idx": 29,
"modified": false,
"parent": { "py/id": 2 },
"py/object": "pico_tokenize.Token",
"source": { "py/id": 6 },
"type": { "py/id": 23 },
"value": ")",
"vline": 1
}, {
"children": [ {
"children": [
{
"children": { "py/tuple": [] },
"endidx": 38,
"idx": 32,
"modified": false,
"parent": { "py/id": 27 },
"py/object": "pico_tokenize.Token",
"source": { "py/id": 6 },
"type": { "py/id": 5 },
"value": "return",
"vline": 2
}, {
"children": [ {
"children": { "py/tuple": [] },
"endidx": 43,
"idx": 39,
"modified": false,
"parent": { "py/id": 30 },
"py/object": "pico_tokenize.Token",
"source": { "py/id": 6 },
"type": { "py/id": 5 },
"value": "true",
"vline": 2
} ],
"endidx": 43,
"idx": 39,
"parent": { "py/id": 27 },
"py/object": "pico_parse.Node",
"scopespec": null,
"source": { "py/id": 6 },
"token": { "py/id": 32 },
"type": { "py/object": "pico_parse.NodeType", "value": 3 },
"value": null
}
],
"endidx": 43,
"idx": 32,
"items": [ { "py/id": 30 } ],
"parent": { "py/id": 25 },
"py/object": "pico_parse.Node",
"scopespec": null,
"source": { "py/id": 6 },
"type": { "py/object": "pico_parse.NodeType", "value": 24 },
"value": null
} ],
"endidx": 43,
"idx": 32,
"parent": { "py/id": 2 },
"py/object": "pico_parse.Node",
"scopespec": { "py/tuple": [ false, [] ] },
"source": { "py/id": 6 },
"stmts": [ { "py/id": 27 } ],
"type": { "py/object": "pico_parse.NodeType", "value": 29 },
"value": null
}, {
"children": { "py/tuple": [] },
"endidx": 47,
"idx": 44,
"modified": false,
"parent": { "py/id": 2 },
"py/object": "pico_tokenize.Token",
"source": { "py/id": 6 },
"type": { "py/id": 5 },
"value": "end",
"vline": 3
}
],
"endidx": 47,
"idx": 11,
"kind": null,
"name": "funcname",
"params": [],
"parent": { "py/id": 0 },
"py/object": "pico_parse.Node",
"scopespec": { "depth": 1, "funcdepth": 1, "items": {}, "parent": { "py/id": 19 }, "py/object": "pico_parse.Scope" },
"source": { "py/id": 6 },
"target": { "py/id": 12 },
"type": { "py/object": "pico_parse.NodeType", "value": 15 },
"value": null
} ],
"endidx": 47,
"globals": { "__dict__": {}, "funcname": { "py/id": 18 }, "py/object": "utils.LazyDict" },
"idx": 11,
"parent": null,
"py/object": "pico_parse.Node",
"scopespec": { "py/tuple": [ false, [] ] },
"source": { "py/id": 6 },
"stmts": [ { "py/id": 2 } ],
"type": { "py/id": 36 },
"value": null
}
Assuming I understood what you're asking for correctly, the idx and endidx attributes specify the range of the function's tokens within the source (e.g. node.source.text[node.idx : node.endidx])
OK, so, the Token with "value":"function"
is the literal function
in the source, spanning idx 11-19.
Its parent
is this Node:
{
"body": { "py/id": 25 },
"children": [ "CUT" ],
"endidx": 47,
"idx": 11,
"kind": null,
"name": "funcname",
"params": [],
"parent": { "py/id": 0 },
"py/object": "pico_parse.Node",
"scopespec": { "depth": 1, "funcdepth": 1, "items": {}, "parent": { "py/id": 19 }, "py/object": "pico_parse.Scope" },
"source": { "py/id": 6 },
"target": { "py/id": 12 },
"type": { "py/object": "pico_parse.NodeType", "value": 15 },
"value": null
}
which spans idx 11-47. That seems like everything from function
to end
. That Node has "kind": null
and "name": "funcname"
(because it's a function with that name, not to be confused with the sibling of that earlier token that has "value": "funcname"
and is the literal funcname
from the source). It also has "scopespec": { "depth": 1, "funcdepth": 1, ...
which seems to describe some stuff about the function. Is there any way to directly identify that this Node represents a function, or do I need to duck type it based on the presence of .scopespec.funcdepth
or similar?
The parent of that Node is another Node which has the same idx/endidx but different properties. I think it might represent the whole source, minus any leading or trailing comments, although I'll need to experiment more to figure that out.
Aha!
"type": { "py/object": "pico_parse.NodeType", "value": 15 },
pico_parse.NodeType.values[15]
is "function"
!
Yeah, node.type == NodeType.function
Yeah, the parent of the function is the root node, it indeed spans the whole source minus any leading/trailing whitespace/comments
I want to add per-function token and character counts, pre and post minification, to https://github.com/sparr/pico8lib. My initial idea was to use a script to parse the functions out of a source file, then feed them separately to
shrinko8 -c
, parse the output, prepend it as a comment to the function, then reassemble the source file. However, it occurs to me that a lot of the necessary functionality to do this better already exists in this project.My proposal is a mode for shrinko8 that does all of this directly. Parse a cart, identify all the functions, count their tokens and characters, minify them, count those tokens and characters, insert a comment into the original cart above the function (possibly before existing comments, and overwriting a previous comment from this process) and emit the annotated original cart.
I'm willing to do most of the work to make this happen, but I find the code here a bit daunting. I've made it to
process_code
where it callstokenize
thenparse
, and I can look throughNode
s underroot
, but I am not sure how to get back from aNode
to tokens thatcount_tokens
wants, or how to turn aNode
back into a cart after I've modified some of its grandchildren. Generally moving back and forth between the source, tokens, and nodes, in a way that keeps them mapped to each other for modifying one based on the other, is a mystery to me so far. I would appreciate any pointers you can share before I get started on this. Even if this isn't a welcome feature for the main repo, I'd still like to pursue it as a fork or patch so I can use it myself.