google / wuffs

Wrangling Untrusted File Formats Safely
Other
4.07k stars 129 forks source link

creating code for other languages than C #38

Open benibela opened 3 years ago

benibela commented 3 years ago

How much effort would it take to extend the transpiler to create code for other languages, e.g. Pascal?

nigeltao commented 3 years ago

I'd expect the effort required to be relatively straightfoward. A design goal from day one was to accomodate other target languages. Long term, doc/roadmap.md already lists:

and the directory structure has a test/c directory in anticipation of others such as test/pascal.

However, in the short term, Wuffs-the-language is still changing relatively rapidly, and such changes are harder to make the more target languages (C, Go, Pascal, Rust, etc) there are.

You are obviously welcome to write your own experimental Wuffs-to-Pascal transpiler, using the github.com/google/wuffs/lang/... Go packages (start with github.com/google/wuffs/lang/generate based on how github.com/google/wuffs/internal/cgen uses it), but I'd rather not merge any such pull requests until Wuffs-the-language has stabilized.

adsharma commented 3 years ago

I've recently added support for the following languages in py2many on top of cpp/rust:

However, it doesn't have the same level of support as wuffs for parsing files in a secure way. It does however do a few things in this general direction (checking overflows when you add u8 + u8 for example).

adsharma commented 3 years ago

https://github.com/adsharma/py2many

nigeltao commented 3 years ago

checking overflows when you add u8 + u8 for example

Where is this done? I don't see any overflow checking in https://github.com/adsharma/py2many/blob/main/tests/expected/fib.go

adsharma commented 3 years ago

Here

https://github.com/adsharma/py2many/blob/main/common/inference.py#L132

The test case that exercises this code path is called infer-ops.py

On Thu, Feb 18, 2021, 8:53 PM Nigel Tao notifications@github.com wrote:

checking overflows when you add u8 + u8 for example

Where is this done? I don't see any overflow checking in https://github.com/adsharma/py2many/blob/main/tests/expected/fib.go

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/wuffs/issues/38#issuecomment-781817487, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFA2A34UL5UJECC4K3B3CTS7XVGHANCNFSM4SVBDIAA .

benibela commented 3 years ago

I've recently added support for the following languages in py2many on top of cpp/rust:

But no Pascal? That does not help me

t does however do a few things in this general direction (checking overflows when you add u8 + u8 for example).

I am not using Pascal for fun, but because I thought it was the safest language 15 years ago. Especially Pacal has integer overflow checking. And range checking on strings/arrays. That way it prevents almost all overflows (although in practice, people disable overflow checking in release builds to make it run faster)

I asked because the floating point parsing in FreePascal is both very slow and incorrectly rounded. I wanted to use Eisel-Lemire parsing in Pascal. But I had no time to implement it myself. I do not want to use the C library, because I made an open-source project, and if that combines different languages, people complain they cannot compile it (although the most common complain is that they cannot compile Pascal). So if the Wuffs parsing could be ported to Pascal, it would be perfect.

nigeltao commented 3 years ago

I wanted to use Eisel-Lemire parsing in Pascal. But I had no time to implement it myself.

It shouldn't be hard to port. It's only 80 lines of code (and 700 lines of data tables): https://github.com/golang/go/blob/release-branch.go1.16/src/strconv/eisel_lemire.go

nigeltao commented 3 years ago

https://github.com/adsharma/py2many/blob/main/common/inference.py#L132

The test case that exercises this code path is called infer-ops.py

https://github.com/adsharma/py2many/blob/main/tests/expected/infer-ops.go

says:

func add8(x uint64, y uint64) uint64 {
    return (x + y)
}

and that can still overflow. Similarly for any size_t like type on 64-bit systems, typically used in any pointer-length bounds checks, right?

Conversely, how do you write an overflow-checked fib function, when recursion means that you can't just keep widening the types?

adsharma commented 3 years ago

What do you suggest for those cases?

size_t + size_t = usize_t?

Or an explicit check after the uint64 op that it didn't overflow?

On Fri, Feb 19, 2021, 3:27 AM Nigel Tao notifications@github.com wrote:

https://github.com/adsharma/py2many/blob/main/common/inference.py#L132

The test case that exercises this code path is called infer-ops.py

https://github.com/adsharma/py2many/blob/main/tests/expected/infer-ops.go

says:

func add8(x uint64, y uint64) uint64 { return (x + y) }

and that can still overflow. Similarly for any size_t like type on 64-bit systems, typically used in any pointer-length bounds checks, right?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/wuffs/issues/38#issuecomment-782016018, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFA2A7NUVCLX2MTPBV7WILS7ZDLXANCNFSM4SVBDIAA .

adsharma commented 3 years ago

Pascal was my first language in school. So I have a soft spot for it. But in 2021 I have to go with languages with a larger ecosystem and type safety.

Have you looked at Rust or Kotlin? They have these capabilities as well.

On Fri, Feb 19, 2021, 3:02 AM Benito van der Zander < notifications@github.com> wrote:

I've recently added support for the following languages in py2many on top of cpp/rust:

But no Pascal? That does not help me

t does however do a few things in this general direction (checking overflows when you add u8 + u8 for example).

I am not using Pascal for fun, but because I thought it was the safest language 15 years ago. Especially Pacal has integer overflow checking. And range checking on strings/arrays. That way it prevents almost all overflows (although in practice, people disable overflow checking in release builds to make it run faster)

I asked because the floating point parsing in FreePascal is both very slow and incorrectly rounded. I wanted to use Eisel-Lemire parsing in Pascal. But I had no time to implement it myself. I do not want to use the C library, because I made an open-source project, and if that combines different languages, people complain they cannot compile it (although the most common complain is that they cannot compile Pascal). So if the Wuffs parsing could be ported to Pascal, it would be perfect.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/wuffs/issues/38#issuecomment-782002946, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFA2A53IJJBRIKNHOKFDCDS7ZAKTANCNFSM4SVBDIAA .

nigeltao commented 3 years ago

size_t + size_t = usize_t?

size_t is already an unsigned type. There is no usize_t in C, only size_t and ssize_t.

What do you suggest for those cases?

I'm sorry, but I don't have a good suggestion, because I don't think the approach can fundamentally work. At some point you have a widest integer type, and you can't widen further when you add two of them. I'm also skeptical about any imperative programming language that doesn't allow x = x + 1, where the left hand size's type is obviously the type of x, but the right hand side's type has to be wider.

benibela commented 3 years ago

I wanted to use Eisel-Lemire parsing in Pascal. But I had no time to implement it myself.

It shouldn't be hard to port. It's only 80 lines of code (and 700 lines of data tables): https://github.com/golang/go/blob/release-branch.go1.16/src/strconv/eisel_lemire.go

That code looks easy. I was looking at the Wuffs-generated C code last year rather than the Go code, which was harder. Even harder when I tried to understand the blog posts first. But it might not be future proof to port anything to Pascal anymore.

Pascal was my first language in school. So I have a soft spot for it. But in 2021 I have to go with languages with a larger ecosystem and type safety.

Have you looked at Rust or Kotlin? They have these capabilities as well.

I have ported some parts to Kotlin, but their multiplatform support is not mature yet and it does not support 32-bit linux. Rust is focused on safety, but it panics all the time. It is less "panic-safe" than Pascal. A language that never panics could quickly turn Rust into a legacy language

Anyways, I do not have time to port my entire project at once. But step-by-step would have worked. Port one function to a popular, memory-safe, panic-safe language (like Wuffs? Is it popular?) that has a Pascal code generator, and then the next function. Keep distributing the Pascal code, until each function is written in the new language a few years later, and then only distribute it in the new language...

At some point you have a widest integer type, and you can't widen further when you add two of them. I'm also skeptical about any imperative programming language that doesn't allow x = x + 1, where the left hand size's type is obviously the type of x, but the right hand side's type has to be wider.

I made my own language, too. If there is a looming overflow, it switches to an arbitrary precision decimal type. That is the best way for a scripting language, but not appropriate for a system language

adsharma commented 3 years ago

ssize_t is exactly what I was thinking, but got the signs wrong. Thanks for correcting me.

My thinking is that 64 bit overflow is extremely rare in practice, so probably not as important to defend against as 32 bit overflow (which is also relatively rare. The Java bug was around for 20 years).

On Fri, Feb 19, 2021 at 12:31 PM Nigel Tao notifications@github.com wrote:

size_t + size_t = usize_t?

size_t is already an unsigned type. There is no usize_t in C, only size_t and ssize_t.

What do you suggest for those cases?

I'm sorry, but I don't have a good suggestion, because I don't think the approach can fundamentally work. At some point you have a widest integer type, and you can't widen further when you add two of them. I'm also skeptical about any imperative programming language that doesn't allow x = x + 1, where the left hand size's type is obviously the type of x, but the right hand side's type has to be wider.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/wuffs/issues/38#issuecomment-782325859, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFA2A64X5MOBA6QNAOK7N3S73DBTANCNFSM4SVBDIAA .

adsharma commented 3 years ago

Yeah - I like Kotlin the language, but not how it works as an alternative to rust or C on 64 bit linux.

https://discuss.kotlinlang.org/t/kotlinc-as-a-native-binary/20702

Even the multi-platform story on linux64 is weak (support was dropped recently).

32 bit pascal doesn't sound like a popular platform/language combo. Hope you consider one of the languages supported by py2many. Python's popularity has been rising because of data science and educational use (sort of like Pascal in the 80s and 90s).

On Fri, Feb 19, 2021 at 3:28 PM Benito van der Zander < notifications@github.com> wrote:

I wanted to use Eisel-Lemire parsing in Pascal. But I had no time to implement it myself.

It shouldn't be hard to port. It's only 80 lines of code (and 700 lines of data tables):

https://github.com/golang/go/blob/release-branch.go1.16/src/strconv/eisel_lemire.go

That code looks easy. I was looking at the Wuffs-generated C code last year rather than the Go code, which was harder. Even harder when I tried to understand the blog posts first. But it might not be future proof to port anything to Pascal anymore.

Pascal was my first language in school. So I have a soft spot for it. But in 2021 I have to go with languages with a larger ecosystem and type safety.

Have you looked at Rust or Kotlin? They have these capabilities as well.

I have ported some parts to Kotlin, but their multiplatform support is not mature yet and it does not support 32-bit linux. Rust is focused on safety, but it panics all the time. It is less "panic-safe" than Pascal. A language that never panics could quickly turn Rust into a legacy language

Anyways, I do not have time to port my entire project at once. But step-by-step would have worked. Port one function to a popular, memory-safe, panic-safe language (like Wuffs? Is it popular?) that has a Pascal code generator, and then the next function. Keep distributing the Pascal code, until each function is written in the new language a few years later, and then only distribute it in the new language...

At some point you have a widest integer type, and you can't widen further when you add two of them. I'm also skeptical about any imperative programming language that doesn't allow x = x + 1, where the left hand size's type is obviously the type of x, but the right hand side's type has to be wider.

I made my own language, too. If there is a looming overflow, it switches to an arbitrary precision decimal type. That is the best way for a scripting language, but not appropriate for a system language

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/wuffs/issues/38#issuecomment-782453820, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFA2A2D3B4HBQPG5ZL4DP3S73X3FANCNFSM4SVBDIAA .

nigeltao commented 3 years ago

My thinking is that 64 bit overflow is extremely rare in practice,

Rare still means exploitable. https://blog.chromium.org/2012/05/tale-of-two-pwnies-part-1.html discusses remote code execution due in part to a size_t overflow (and size_t is often 64 bits).

If py2many isn't overflow-proof, that's fine, but then it's not really playing the same game that Wuffs is, so the Wuffs issue tracker probably isn't the best place to discuss it.

adsharma commented 3 years ago

The committed fix for that chromium bug was to add an explicit check for overflow/underflow.

It is my intention to handle overflows better in py2many. But I'm not sure introducing a runtime check on every arithmetic op is the answer.

And I agree that this tracker isn't a great place to discuss that topic.

On Sat, Feb 20, 2021, 5:03 PM Nigel Tao notifications@github.com wrote:

My thinking is that 64 bit overflow is extremely rare in practice,

Rare still means exploitable. https://blog.chromium.org/2012/05/tale-of-two-pwnies-part-1.html discusses remote code execution due in part to a size_t overflow (and size_t is often 64 bits).

If py2many isn't overflow-proof, that's fine, but then it's not really playing the same game that Wuffs is, so the Wuffs issue tracker probably isn't the best place to discuss it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/wuffs/issues/38#issuecomment-782774580, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFA2A3TNZL3PGYBWVF3DW3TABLV3ANCNFSM4SVBDIAA .

eliasnaur commented 1 year ago

I'd rather not merge any such pull requests until Wuffs-the-language has stabilized.

What's the status here, in particular for Go output? I'd like to port a library to Go, but Wuffs would give me portability and additional safety guarantees.

nigeltao commented 1 year ago

I don't think that Wuffs-the-language is stable enough yet. Sorry.

For example, commits 2f18dc61 and 2865c5bf just landed a week ago, each adding new methods to the "slice of T" types.