xlab / c-for-go

Automatic C-Go Bindings Generator for Go Programming Language
https://c.for-go.com
MIT License
1.5k stars 120 forks source link

Passing null terminated c-strings in #19

Closed pwaller closed 7 years ago

pwaller commented 7 years ago

Hi!

I've had some success automatically generating bindings to the CPython API. A common mistake I'm making though in using the generated bindings is forgetting to append a null character to strings that I feed in. Stuff breaks in interesting and non-obvious ways when you do that.

What do you think about having a PtrTips which allows specifying that the generated bindings should append a null to the passed string before passing it onwards as a null pointer?

I have contemplated writing more go-idiomatic wrappers which append the null, but this really is against the spirit of automatically generating bindings... :)

pwaller commented 7 years ago

I wrote this issue on the assumption that c-for-go expects you to null terminate the strings you pass to the generated API. I actually can't find any advice on that front, but observe that all sorts of things don't work unless you null-terminate the strings you feed into the API. So maybe it's supposed to work and I'm encountering a bug, instead.

xlab commented 7 years ago

Hi!

Yes that's a common mistake. I usually address that when creating a wrapper for the generated code, when I want to simplify the API and introduce objects. As an example see String https://github.com/xlab/pocketsphinx-go/blob/master/sphinx/helpers.go#L42 which is used to make params safe https://github.com/xlab/pocketsphinx-go/blob/master/sphinx/ngram.go#L316 so user will working with sphinx.String and know that it's safe.

I wrote this issue on the assumption that c-for-go expects you to null terminate the strings you pass to the generated API.

Yes, this is so by design. The point there is that c-for-go generates its binding on the API interface level, not the data level. I other worlds, it does rely on types and method signatures but doesn't go into data context. It doesn't modify or copy data except when unavoidable. Passing raw things like string or []byte is a 0-allocation operation, memory directly referenced from Go to C.

This is mainly because I wanted to avoid any overhead and hidden effects. Many C functions out there are waiting for non-null terminated strings, knowing their length. And if they require the null-termination, that's up to user to decide how to properly do that. Otherwise you will introduce an extra allocation in the generated code, what I tried to avoid.

It's about safety VS performance, I chose the latter to be the base layer, and the safety can be enforced later as an additional layer written by humans, for humans.

pwaller commented 7 years ago

Yes that's a common mistake.

It might be worth documenting it clearly somewhere. I scoured the docs a few times and didn't find anything.

The problem though is that even knowing what is correct I still make the mistake.

It's about safety VS performance, I chose the latter to be the base layer, and the safety can be enforced later as an additional layer written by humans, for humans.

I understand your motivation. However, for strings specifically, this means that the "additional layer" would have to be almost as big as the first layer. Which seems to me to defeat the idea of doing code generation in the first place. See for instance the python C-API. It's huge, and very many functions take strings, all of which must be 0-terminated. https://docs.python.org/3/c-api/index.html

I'm not asking you to change the default, I'm asking if there is room in the generator to automate this with a new PtrType. Any thoughts on that? It would eliminate a lot of manual and error-prone work.

xlab commented 7 years ago

I agree. I will add this as a global preference of generator.

pwaller commented 7 years ago

Great. Though I hope it is still possible to override it in specific cases where performance is a concern, or for specific cases which require null terminated strings.

xlab commented 7 years ago

@pwaller the override can be achieved by taking out the code into a separate file and building the hand-made optimisations upon generated code.

I'm not making this feature into the tip system, because it's very easy to create a mess of strings that are null-terminated automatically and other cases that require manual null-termination. So in order to use API safely, the user will be forced to look into guts of binding code (not the interface, but cgo_helpers.go) and study whether the strings are terminated in some particular functions. This is wrong.

pwaller commented 7 years ago

How about introducing a new type:

type CString string

Then it would be obvious at the point of use that you're doing something strange:

cpkg.Function(cpkg.CString(s))

The type cast should be "free" but annotates what is happening in a fairly obvious manner.

pwaller commented 7 years ago

(and is impossible to accidentally miss because the type-checker will complain)

xlab commented 7 years ago

Type-cast won't guarantee safeness, this doesn't contribute to solution but simply makes the resulting API harder to use. I want to use string, it's a part of the language.

We don't need annotations there, its either safe strings across all package, or not.

xlab commented 7 years ago

@pwaller your wish is granted. I also re-generated my Nuklear package for UI which also benefits a lot from the automatic null termination approach.

GENERATOR:
  Options:
    SafeStrings: true

happy hacking! 🍻

pwaller commented 7 years ago

Looks great, thanks :)

laser commented 5 years ago

your wish is granted.

Thanks for adding this. Took a long time to find this issue - but I'm pleased to see that I don't have to NUL-terminate my Go strings!