ktye / i

interpret
100 stars 16 forks source link

rewrite token/parse in k #48

Closed ktye closed 11 months ago

ktye commented 1 year ago

token&parse costs roughly 4300 bytes. can we rewrite it shorter in k?

token/parse cannot be written in k because it loops. but we could parse the parser and embed the instructions.

instruction embedding could be any of:

whatever, the result must be shorter than 4k of spacious wasm bytes.

ktye commented 1 year ago

e.g. translate this to ktye/go:

package main

import (
    "fmt"
    "os"
)

func main() {
    b, e := os.ReadFile("apl.lz4")
    fatal(e)
    fmt.Println("size: ", len(b))

    // at the start assume short-header(7) and one block(4:size header)
    // at the end assume data-block-checksum(4) + end-mark(4)
    lz(b[7+4 : len(b)-8])
    println(b)
}
func fatal(e error) {
    if e != nil {
        panic(e)
    }
}

func lz(b []byte) []byte {
    var r []byte

    i := 0

    for {

        t := b[i]        //block token
        l := int(t >> 4) //literal length

        i++
        if l == 15 {
            for {
                l += int(b[i])
                i++
                if b[i-1] != 255 {
                    break
                }
            }
        }

        fmt.Printf("out-lit: %v %q\n", b[i:i+l], string(b[i:i+l]))
        r = append(r, b[i:i+l]...)
        i += l

        if i >= len(b) { //termination condition
            return r
        }

        o := int16(b[i]) | (int16(b[1+i]) << 8) //offset(from tail) 16bit
        i += 2

        l = 4 + int(t&15) //match-length: lower token bits
        if l == 19 {
            for {
                l += int(b[i])
                i++
                if b[i-1] != 255 {
                    break
                }
            }
        }
        for j := 0; j < l; j++ {
            r = append(r, r[len(r)-int(o)])
        }
        fmt.Printf("out-mat: %v %q\n", r[len(r)-l:], string(r[len(r)-l:]))

    }
    return r
}
kelas commented 11 months ago

token/parse cannot be written in k because it loops

true, but not entirely true. by choosing reasonable practical limits on the depth of the parse tree, tok/prs can use global state. why not.

nsl (and to some degree atw) were looking into implementing k parsers in k some 15+ years ago, which should still be available at nsl.com, only why?