vfx-rs / organization

The repository for the group policies, style guides, and easy access to the projects statuses
Creative Commons Zero v1.0 Universal
54 stars 6 forks source link

Add Style Guides #2

Closed scott-wilson closed 3 years ago

scott-wilson commented 3 years ago

This is some initial thoughts on a style guide. Everything is up for debate on these based on work that has already happened on OpenTimelineIO, and OpenImageIO.

anderslanglands commented 3 years ago

I'd love to just get this committed and fill this out a bit myself as I'm knee-deep in this stuff right now so getting it all down while it's in my head would be very useful.

Only thought it that this will become much more than just a style guide so maybe want to call it "binding guide" or "conversion guide" rather than "style guide"?

scott-wilson commented 3 years ago

I'd love to just get this committed and fill this out a bit myself as I'm knee-deep in this stuff right now so getting it all down while it's in my head would be very useful.

Only thought it that this will become much more than just a style guide so maybe want to call it "binding guide" or "conversion guide" rather than "style guide"?

Sounds like a good idea. I'll try to flesh this out more tonight as well.

luke-titley commented 3 years ago

This is great! Some quick things with not much thought:

Is it worth having a rustfmt configuration for this? I always use the default, but with 80 chars line limit (for easy comparing/merging)

What are your thoughts on warnings with rust ?

anderslanglands commented 3 years ago

With regards asserts, just to be extra-safe you should also check the offset of each member as well (you can see this here: https://github.com/OpenImageIO/oiio/blob/26c6e600453c8e81ff831409bc04ef4e49f4d497/src/libOpenImageIO-C/c-imageio.cpp#L39 ) - just in case fields get reordered. I'm using macros defined by OIIO for the static asserts, but we probably want to make a small header library of our own for this kind of thing, along with bit_cast<>().

What do you mean by "warnings with rust"?

On Fri, 30 Oct 2020 at 09:21, Luke Titley notifications@github.com wrote:

This is great! Some quick things with not much thought:

  • rust code should use rustfmt (cargo fmt) and clippy maybe ?
  • for bindings it's good to have a static assert in the implementation code that compares the size and alignment of the original c++ object and its equivalent c object (in the cases where you want to be able to cast from one to the other).

What are your thoughts on warnings with rust ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vfx-rs/organization/pull/2#issuecomment-718998696, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOYQXK4H2XNONCHIKKVBDTSNHFDJANCNFSM4TD4S2AQ .

luke-titley commented 3 years ago

What do you mean by "warnings with rust"? Like enable warnings as errors, but deal with unnecessarily pedantic errors, or just trust developers to make sure they have minimum warnings.

Good point about public fields on a class. I was just thinking about classes with entirely private data where you can lump it into a blob of bytes, and not think about the offsets.

luke-titley commented 3 years ago

Also, what are the rules for constructor naming? I'm curious both in c and rust.

@anderslanglands with your work with binding libs to rust, did you fall back to the From trait a lot to expose constructor overloads?

anderslanglands commented 3 years ago

Anything that just forwards to new and delete should be “_new” and “_delete”.

I like “_new_with_XXX” for constructor overloads or explicit “_from_string” where it makes sense - I find the From trait is not very discoverable.

On Fri, 30 Oct 2020 at 10:19, Luke Titley notifications@github.com wrote:

Also, what are the rules for constructor naming? I'm curious both in c and rust.

@anderslanglands https://github.com/anderslanglands with your work with binding libs to rust, did you fall back to the From trait a lot to expose constructor overloads?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/vfx-rs/organization/pull/2#issuecomment-719028481, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOYQXNPGGPTGK5KNDVVGNTSNHL5BANCNFSM4TD4S2AQ .

luke-titley commented 3 years ago

Arg. I'm a bit confused. I can see in the oiio binding example you gave, you're doing heap allocations for the construction/destruction of imagespec. You'd normally do that in a c api if you wanted have more flexibility at the abi level, but if you're doing a one for one wrapper in C, I'd expect direct calls to the destructor/constructor. So you give the user in C the control over whether to use the heap or not.

Something like this:

Foo_new(Foo * this) {
    new (this) cpp::Foo();
}
Foo_delete(Foo * this) {
    cast<cpp::Foo*>(this)->~Foo();
}

I wouldn't use new/delete for the imath basic types for example.

I'm guessing what you mean is, if you're calling constructor/destructor directly, or if you're using new/delete then use _new and _delete?

anderslanglands commented 3 years ago

ImageSpec has to be constructed on the heap in C++ and then returned because it has STL types as data members. So for objects like that I would use:

OIIO_ImageSpec_new(...) {
    return cast<OIIO_ImageSpec*>(new OIIO::ImageSpec);
}

For POD types that can be bit_cast() I would probably just leave them on the stack (e.g. OIIO::TypeDesc, OIIO::ROI) unless they were gargantuan.

luke-titley commented 3 years ago

ImageSpec has to be constructed on the heap in C++ and then returned because it has STL types as data members.

Why? You mean because the stl types can change in size and alignment from one stl implementation to another? I guess other wise you'd have to have one set of c bindings for each stl implementation that the lib might be compiled in. Or you'd have to generate the bindings during build, and for windows that would involve running clang with microsoft's STL. I guess that's pretty weird.

It's not just POD types you can allocate on the stack, it's anything that has a stable size and alignment, from compiler to compiler.

anderslanglands commented 3 years ago

I mean because C has no idea what the size and alignment of a std::vector is. How would you define this for the C compiler?

class SomeClass {
    int a;
    std::vector<int> b;
    std::string c;
    float d;
};

Unless you mean doing:

typedef struct {
    int a;
    char pad[4];
    char b[24];
    char c[24];
    float d;
} SomeClass;

?

luke-titley commented 3 years ago

Yes, I was thinking of.


typedef struct {
   char data[ size of SomeClass in c++];
} __attribute__((alignment of SomeClass in c++)) SomeClass;
anderslanglands commented 3 years ago

Yes, I was thinking of.

typedef struct {
   char data[ size of SomeClass in c++];
} __attribute__((alignment of SomeClass in c++)) SomeClass;

Heh, yeah I've always just heap-allocated anything that I can't poke at directly from C. Just seemed simpler.

anderslanglands commented 3 years ago

Yes, I was thinking of.

typedef struct {
   char data[ size of SomeClass in c++];
} __attribute__((alignment of SomeClass in c++)) SomeClass;

Heh, yeah I've always just heap-allocated anything that I can't poke at directly from C. Just seemed simpler.

Thinking about it more, it makes me a bit uneasy - imagine copying one of these structs in C, then passing it twice back to C++.

luke-titley commented 3 years ago

Yes, I was thinking of.

typedef struct {
   char data[ size of SomeClass in c++];
} __attribute__((alignment of SomeClass in c++)) SomeClass;

Heh, yeah I've always just heap-allocated anything that I can't poke at directly from C. Just seemed simpler.

Thinking about it more, it makes me a bit uneasy - imagine copying one of these structs in C, then passing it twice back to C++.

Yeah I guess you have to make a distinction between c-bindings that are intended to be used by a human and those that are just going to be underneath bindings from another language.

This would most parallel what c++ is doing, so cost wise it would be minimal.

SomeClass a;
SomeClass_new(&a);

// Copy Assign
SomeClass b;
SomeClass_new(&b);
SomeClass_assign(&b, &a);

// Or copy construct
SomeClass c;
SomeClass_copy(&c, &a);

pretty straight forward to wrap these in python and rust, but it's not very human friendly.

As I think about things more, for stl types not sure the data[size of class] idea is ideal, because if codegen (clang) has access to different stl implementation to the compiler your using for the c++ lib, you'll end up with some knarly bugs. But it's still a useful pattern for things like classes that have homebrew shared_ptr implementations, you save on unnecessary mallocs.

alexxbb commented 3 years ago

Looks like you guys jumped straight into C API details, which should be discussed separately IMO. 🙂 Style wise I'm 100% for rustfmt, clippy and docs.rs.

I think we all agree that using raw bindgen generated code is no fun at all, therefore I think the end goal is to have something like this (oiio as example):

luke-titley commented 3 years ago

I think we all agree that using raw bindgen generated code is no fun at all, therefore I think the end goal is to have something like this (oiio as example):

  • oiio // original c++
  • oiio-c // c bindings
  • oiio-sys // raw rust bindings crate
  • oiio-rs // ideomatic rust ontop of oiio-sys

Yes that seems to be the consensus.

anderslanglands commented 3 years ago

Yeah I guess you have to make a distinction between c-bindings that are intended to be used by a human and those that are just going to be underneath bindings from another language.

This would most parallel what c++ is doing, so cost wise it would be minimal.

SomeClass a;
SomeClass_new(&a);

// Copy Assign
SomeClass b;
SomeClass_new(&b);
SomeClass_assign(&b, &a);

// Or copy construct
SomeClass c;
SomeClass_copy(&c, &a);

pretty straight forward to wrap these in python and rust, but it's not very human friendly.

As I think about things more, for stl types not sure the data[size of class] idea is ideal, because if codegen (clang) has access to different stl implementation to the compiler your using for the c++ lib, you'll end up with some knarly bugs. But it's still a useful pattern for things like classes that have homebrew shared_ptr implementations, you save on unnecessary mallocs.

Not sure anout homebrew shared_ptr but std::shared_ptr you're still going to have to heap-allocate the shared_ptr bytes that you return to C aren't you? Otherwise you'll be left with a dangling reference when the wrapper function returns. Or would you placement new into the stack memory there as well? Still feels dangerous because of the copying issue. In Rust at least you could force move-only on the resulting object, but any other languages that tried to bind to the C layer might have more difficulty? I don't know.

With regards stl compatibility, https://libcxx.llvm.org/ says:

ABI compatibility with gcc's libstdc++ for some low-level features such as exception objects, rtti and memory allocation.

which doesn't exactly fill me with confidence. I can't find a clear answer via google either.

So, I like the idea of giving the caller the freedom to choose whether a non-trivial object lives on the stack rather than the heap, but the vaguery surrounding the layout and the double-free/dangling reference issues with copying on the C side concern me. On the flip side, hiding anything that can't be directly declared in C behind an opaque pointer is guaranteed to work in all situations. There will be some overhead due to the extra heap allocations, but I'm not convinced they will be significant or even measurable in real code. EDIT: especially since any time you actually want to do anything with that object you're going to be passing a pointer to it back to C++, so as far as the C++ library is concerned it's at some arbitrary location in memory anyway.

scott-wilson commented 3 years ago

Looks like you guys jumped straight into C API details, which should be discussed separately IMO. slightly_smiling_face Style wise I'm 100% for rustfmt, clippy and docs.rs.

I think we all agree that using raw bindgen generated code is no fun at all, therefore I think the end goal is to have something like this (oiio as example):

* oiio // original c++

* oiio-c // c bindings

* oiio-sys // raw rust bindings crate

* oiio-rs // ideomatic rust ontop of oiio-sys

Right! the (for example) oiio vs oiio-sys. I know that there's two trains of thought for that. One is having one crate that is just a think wrapper on top, and the other that is the "opinionated" wrapper. I personally have no real opinion, but if we go the *-sys crates, then I have some more names to snatch up/ ask the current crate owners if they're cool with giving up their crate names.

Also, this could be too far into a C deep dive. So, we should probably figure out what are the things left to do before we can call this PR merged.

scott-wilson commented 3 years ago

So, I like the idea of giving the caller the freedom to choose whether a non-trivial object lives on the stack rather than the heap, but the vaguery surrounding the layout and the double-free/dangling reference issues with copying on the C side concern me. On the flip side, hiding anything that can't be directly declared in C behind an opaque pointer is guaranteed to work in all situations. There will be some overhead due to the extra heap allocations, but I'm not convinced they will be significant or even measurable in real code. EDIT: especially since any time you actually want to do anything with that object you're going to be passing a pointer to it back to C++, so as far as the C++ library is concerned it's at some arbitrary location in memory anyway.

Sounds like our rule should be as follows:

luke-titley commented 3 years ago

So, I like the idea of giving the caller the freedom to choose whether a non-trivial object lives on the stack rather than the heap, but the vaguery surrounding the layout and the double-free/dangling reference issues with copying on the C side concern me. On the flip side, hiding anything that can't be directly declared in C behind an opaque pointer is guaranteed to work in all situations. There will be some overhead due to the extra heap allocations, but I'm not convinced they will be significant or even measurable in real code. EDIT: especially since any time you actually want to do anything with that object you're going to be passing a pointer to it back to C++, so as far as the C++ library is concerned it's at some arbitrary location in memory anyway.

Sounds like our rule should be as follows:

  • If the class can be completely represented in C, then it can be implemented in C, or go behind an opaque pointer.
  • If the class cannot be completely represented in C, then it has to be on the heap.
  • If speed is absolutely critical, then the type may be reimplemented in C/Rust.

ok, I think it's too deep a dive, those rules can be reworked a bit later I guess. Need another more thorough thread to properly lay out the rules for memory layout of wrapping classes. With more code examples, and pros/cons.

One thing that I feel should be clear, is that the c bindings are intended as an intermediate layer for language bindings primarily. So they should be low overhead at maybe the cost of verbosity where necessary.

scott-wilson commented 3 years ago

Alright, error handling is updated to be more C like, and I've added some examples of class attributes. @luke-titley I hope that my understanding of what you said is correct.

Also, where does everyone think we are on this PR? I think that we should probably add sections on how data lifetimes are handled (if it is created on one side of the bindings, then it must be destroyed on that side). But asides from that, we probably want to do some deep dive discussions in another issue or PR?

anderslanglands commented 3 years ago

I think we should just merge it then create separate PRs for specific issues. Or we could create an “issues” repository and do something similar to Rust’s RFC process? I like that personally.

On Sun, 1 Nov 2020 at 21:42, Scott Wilson notifications@github.com wrote:

Alright, error handling is updated to be more C like, and I've added some examples of class attributes. @luke-titley https://github.com/luke-titley I hope that my understanding of what you said is correct.

Also, where does everyone think we are on this PR? I think that we should probably add sections on how data lifetimes are handled (if it is created on one side of the bindings, then it must be destroyed on that side). But asides from that, we probably want to do some deep dive discussions in another issue or PR?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/vfx-rs/organization/pull/2#issuecomment-720054605, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOYQXKZM2VVEERQ2IC5T4TSNUNQRANCNFSM4TD4S2AQ .

scott-wilson commented 3 years ago

I think we should just merge it then create separate PRs for specific issues. Or we could create an “issues” repository and do something similar to Rust’s RFC process? I like that personally.

Yeah, I was thinking about the Rust RFC process (I don't know the details, but right now I'm experimenting with Rust RFC process light). The only thing that is holding me back from going for an RFC like process is if this group becomes officially owned by the ASWF, how would our organization style mesh with the ASWF organization style. I guess a bigger question is what is our relationship with the ASWF, and how do we want our relationship with the foundation to look in the future?

So, I'll do the following:

luke-titley commented 3 years ago

I think we should just merge it then create separate PRs for specific issues. Or we could create an “issues” repository and do something similar to Rust’s RFC process? I like that personally.

Yeah, I was thinking about the Rust RFC process (I don't know the details, but right now I'm experimenting with Rust RFC process light). The only thing that is holding me back from going for an RFC like process is if this group becomes officially owned by the ASWF, how would our organization style mesh with the ASWF organization style. I guess a bigger question is what is our relationship with the ASWF, and how do we want our relationship with the foundation to look in the future?

So, I'll do the following:

  • Wait a few days before merging this branch
  • Create an issue for how we plan to organize things in the future

I pretty impatient to get going. I think it's not a big deal if the process evolves a little, or even if it is reorganised when the AWSF take ownership.

RFC sounds great. It's simple and we'll be able to start fleshing out/justifying things right away. At this point, I think we need to get agreement on the general approach relatively quickly, while there is momentum.

Worse case, we have a ton of markdown files justifying our approach.

scott-wilson commented 3 years ago

Yeah, I want to get moving quickly too. I've created issue #3 to start to discuss organizing the project. Also, I want to get the ball rolling for the automatic c bindings, so we can start to get a first pass on those.

anderslanglands commented 3 years ago

Yeah I agree with Luke here - let’s just get going with an RFC process and if we need to change later we can do so. I would suggest creating RFCs for each of the big-ticket items, which to my mind are:

On Mon, 2 Nov 2020 at 09:58, Luke Titley notifications@github.com wrote:

I think we should just merge it then create separate PRs for specific issues. Or we could create an “issues” repository and do something similar to Rust’s RFC process? I like that personally.

Yeah, I was thinking about the Rust RFC process (I don't know the details, but right now I'm experimenting with Rust RFC process light). The only thing that is holding me back from going for an RFC like process is if this group becomes officially owned by the ASWF, how would our organization style mesh with the ASWF organization style. I guess a bigger question is what is our relationship with the ASWF, and how do we want our relationship with the foundation to look in the future?

So, I'll do the following:

  • Wait a few days before merging this branch
  • Create an issue for how we plan to organize things in the future

I pretty impatient to get going. I think it's not a big deal if the process evolves a little, or even if it is reorganised when the AWSF take ownership.

RFC sounds great. It's simple and we'll be able to start fleshing/justifying things right away. At this point, I think we need to get agreement on the general approach relatively quickly, while there is momentum.

Worse case, we have a ton of markdown files justifying our approach.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/vfx-rs/organization/pull/2#issuecomment-720150536, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOYQXKUM457IIWW4J43V33SNXDXXANCNFSM4TD4S2AQ .

luke-titley commented 3 years ago
scott-wilson commented 3 years ago

I'm going to merge this and we can add RFCs to here for more discussions: https://github.com/vfx-rs/rfcs