Open mr-word opened 3 years ago
Since writing this issue I'm understanding it better, I think it's quite simple, I'm going to try something like this and report back,
isRLP = (str) => {
try {
rlp.decode(str, skipRemainderCheck=false)
return true
} catch(..) {
return false
}
}
rlpMatcher = (x) => isRLP(x)
In particular I don't use skipRemainderCheck.
I'm writing two parsers for a simple forth-like, one for the 'assembly' and the other for the bytecode. Some ops have a variable-length argument in-code, encoded as varints or RLP.
Here is what a generic op that has an RLP-encoded argument looks like in the assembler:
RLP_OP [1, 2, [10, 20, 30]] NOP NOP NOP
This is straightforward to parse.The RLP encoding of
[1,2,[10,20,30]]
isc60102c30a141e
. SupposeRLP_OP
encodes as07
andNOP
as00
. The javascript RLP lib gives the helpful option to parse only a prefix of a given bytestring if it is valid RLP. In other words we could userlp.decode(encoded, skipRemainderCheck=true)
to extractc60102c30a141e
fromc60102c30a141e00000000
.So basically the challenge is to parse
07c60102c30a141e000000
into the same AST asRLP_OP [1,2,[10,20,30]] NOP NOP NOP
.I see nearley's custom token matcher functionality. How could I combine this with
rlp.decode(encoded, skipRemainderCheck=true)
to tokenize RLP-encoded chunks? To be clear, I just need the lexer to give me an RLP chunk, I don't need to parse the recursive lists themselves into the AST (this can be done in a later pass).The same technique could be used for varints and whatever else.