pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.1k stars 2.05k forks source link

Fix MSVC MT/MD incompatibility in PYBIND11_BUILD_ABI #4953

Open isuruf opened 7 months ago

isuruf commented 7 months ago

Description

The ABI is compatible between different versions of MSVC 14.* (2015 - 14.0, 2017 - 14.1, 2019 - 14.2, 2022 - 14.3). However, the internal structures are different between these versions. For eg: the internal details of how C++ structures like maps are implemented might be different. (I'm not saying they are, but how maps are implemented is an implementation detail)

With /MT you are statically linking in the C runtime and when allocating one structure in a DLL created with a different MSVC version, runtime linked statically, and trying to use it in another DLL linked to a different MSVC runtime (static or dynamic) results in crashes. That's why when you link with /MT and is crossing DLL boundaries you need them to have the same version.

However, if you are using /MD you are using only one C runtime and the internal structures are consistent. Therefore you don't need to check MSVC version.

See https://learn.microsoft.com/en-us/cpp/porting/binary-compat-2015-2017?view=msvc-170

Suggested changelog entry:

rwgk commented 7 months ago

Could you please explain the rationale in the PR description?

IIUC, currently this PR undoes #4779 for /MD, is that the right thing to do?

Because ultimately we're sharing C++ pointers between Python extension modules. The C++ ABI must be compatible, independently of everything else. Falling back to define PYBIND11_BUILD_ABI "" (IIUC) seems very suspicious.

I don't know a lot about Windows linking and the various options. I'd appreciate if you could take the time to explain carefully in the PR description so that people like myself can clearly see what is correct and why. (So that we don't have to come back here again.)

isuruf commented 7 months ago

IIUC, currently this PR undoes https://github.com/pybind/pybind11/pull/4779 for /MD, is that the right thing to do?

Yes, the ABI is compatible between different versions of MSVC 14.* (2015 - 14.0, 2017 - 14.1, 2019 - 14.2, 2022 - 14.3). However, the internal structures are different between these versions.

With /MT you are statically linking in the C runtime and when allocating one structure in a DLL created with a different MSVC version, runtime linked statically, and trying to use it in another DLL linked to a different MSVC runtime (static or dynamic) results in crashes. That's why when you link with /MT and is crossing DLL boundaries you need them to have the same version.

However, if you are using /MD you are using only one C runtime and the internal structures are consistent. Therefore you don't need to check MSVC version.

rwgk commented 7 months ago

Yes, the ABI is compatible between different versions of MSVC 14.* (2015 - 14.0, 2017 - 14.1, 2019 - 14.2, 2022 - 14.3).

Amazing.

However, the internal structures are different between these versions.

Could you please clarify what the "internal structures" are?

If you move your comment with that clarification to the PR description it'll look good to me.

Please also point to this PR from the comment in internals.h (See (#4779) -> See (#4953)).

@wjakob this looks like something we want to keep in sync between pybind11 and nanobind.

wjakob commented 7 months ago

I am not so sure about this one @rwgk @isuruf . Consider the following scenario:

This is where we are leaving the world of C ABI compatibility world and entering the world of C++ ABI compatibility. The opener of the PR says:

However, the internal structures are different between these versions. For eg: the internal details of how C++ structures like maps are implemented might be different.

This is precisely the issue. And it is also the issue in the other related PR (https://github.com/pybind/pybind11/issues/3793) and the CondaForge issue (https://github.com/conda-forge/pybind11-feedstock/pull/79), where I am surprised about the nonchalance in introducing an (in my opinion) dangerous change.

After following the above scenario, each of the two pybind11 extensions will now want to read from and write to a shared internal data structure containing lots of nested STL types. It is a minefield -- even the slightest change in how those types are implemented will cause undefined behavior. Crashes in the simplest case, and silent data corruption on the worse side of the scale.

This is just the pybind11 internals -- perhaps one could go through through the entire MSVC C++ library and confirm that types used to implement pybind11 itself (std::type_index, std::string, std::pair, std::tuple, std::unordered_set, std::unordered_multimap, std::vector, std::forward_list, etc.) are indeed 100% ABI compatible between all relevant MSVC versions since they are in any case pretty old and hopefully stabilized.

But it does not stop there in my opinion: pybind11 has not just the job of protecting its own internals. It should also protect user data structures from ABI incompatibilities.The point of multiple extensions sharing a pybind11 "domain" is so that functions in these extensions can call each other with instances across libraries.

Perhaps both users A and B created an extension that includes the header

struct Foo {
   std::fancy_cpp20_type<...> foo;
};

Well, if user A's extension creates an instance of the type Foo and then passes it to a function in user B's extension, then this can still crash if std::fancy_cpp20_type isn't ABI compatible between the MSC versions that built these libraries.

(case in point, C++20 features were exposed in MSVC but were not API stable yet https://github.com/microsoft/STL/issues/1814#issuecomment-845572895)

I will say what I said earlier in the two other commits: most pybind11 extensions don't need to need type-level access to the internals of other pybind11 extensions. They can be isolated and everything works fine.

It is just a small subset of extensions that have such an "intimate" relationship. I think it is reasonable to expect that such extensions are built with C++ libraries that declare themselves as being ABI-compatible.

I know that this can sometimes be inconvenient. But weakening the pybind11 protection mechanism is not the right solution to this problem.

isuruf commented 7 months ago

Are there any pybind developer calls where we can discuss these issues? I feel like it would be easier to discuss these on a call and summarize the findings.

rwgk commented 7 months ago

After following the above scenario, each of the two pybind11 extensions will now want to read from and write to a shared internal data structure containing lots of nested STL types.

I didn't think that through before. I'm trying to make that concrete for myself by looking at the sources. I think this is the door through which the potential for UB slips in:

Here extension module B is extracting a pointer to an object of this struct created in extension module A:

If the <vector> etc. headers seen by B are different from those seen by A, we have UB if B is operating on the internals object.

Sounds convincing to me.

rwgk commented 7 months ago

Are there any pybind developer calls where we can discuss these issues?

We're in vastly different timezones. I think sorting this out via comments here will work best. (We didn't have a meeting in a long time.)

isuruf commented 7 months ago

@rwgk, that wouldn't happen as the allocation of internal structures happen in the same C runtime. If the external view of the object changes, then the stdlib developers would indicate this so that things wouldn't link at all. For eg: the std::string ABI changed from g++ 4 to g++ 5. That's the external view of the object. The internal view of the object does not matter unless you are statically linking in the C runtime.

isuruf commented 7 months ago

(case in point, C++20 features were exposed in MSVC but were not API stable yet https://github.com/microsoft/STL/issues/1814#issuecomment-845572895)

This is a valid point. API stability in the case of experimental features is an issue in all compilers.

rwgk commented 7 months ago

then the stdlib developers would indicate this so that things wouldn't link at all.

That's a cool feature (I didn't know about it), but I don't think we have that protection:

Here the internals** is cast to a void* (stored in a Python capsule):

https://github.com/pybind/pybind11/blob/4bb6163b4fc1299e6dc85f9436bd18a523dd099f/include/pybind11/detail/internals.h#L535

And here the void* is cast back to internals**:

https://github.com/pybind/pybind11/blob/4bb6163b4fc1299e6dc85f9436bd18a523dd099f/include/pybind11/detail/internals.h#L469

That by-passes any link checks. The extension modules are "connected" only at runtime.

Is it possible to have a runtime check for e.g. std::vector compatibility between MSVC versions? (To substitute for the link-time check?) (And is is simple enough, so that gain / cost is reasonable?)

isuruf commented 7 months ago

You can use typeid(func).name() before the type erasure to get the mangled name.

rwgk commented 7 months ago

You can use typeid(func).name() before the type erasure to get the mangled name.

Is that documented to be unique to a given "external view"?

(The mangled names I happened to see over the years didn't appear to encode any versioning information.)

rwgk commented 7 months ago

After turning this over in the back of my mind a little bit:

I think it's clear that falling back to define PYBIND11_BUILD_ABI "" is unsafe.

But it also sounds like using define PYBIND11_BUILD_ABI "_mscver" PYBIND11_TOSTRING(_MSC_VER) is more restrictive than it needs to be.

Could we find a middle ground? Maybe, @isuruf do you think you could add preprocessor-level functionality to map _MSC_VER to something like a PYBIND11_MSVC_ABI_COMPATIBILITY_GROUP, which we then use for PYBIND11_BUILD_ABI?

wjakob commented 7 months ago

Hi @isuruf and @rwgk,

I had a bit of time look a bit into C++ ABI stability in the last few days. I think that I am too paranoid after having been repeatedly bitten by this some years ago. It seems that basically all the three main compiler tools (GCC, Clang, MSVC) have started taking ABI stability seriously at some point in the last years and there haven't been major breaks since then.

So I'm open to making a change here that allows interoperability between more extensions built with different compilers. But let's get it right and make something that we hopefully won't have to tweak retroactively because the criterion turns out to be too loose and, e.g., and ancient MSVC version turns out to not be ABI-compatible after all.

One important think to keep in mind (@rwgk) is that a change here is itself an ABI break, of pybind11. Modifying the PYBIND11_BUILD_ABI string means that the extension is now isolated from all other extensions previously built with a different PYBIND11_BUILD_ABI string. So we should put this change into a larger release that bundles accumulated ABI-breaking commits.

-Wenzel

rwgk commented 2 days ago

@isuruf: this PR was mentioned under https://github.com/conda-forge/pybind11-feedstock/issues/95

This PR slipped my attention. We missed the pybind11 v2.13.0 release. @isuruf do you have an opinion when this is best merged? Would it make sense to tweak this PR, to avoid further ABI breaks? E.g. currently the diff under this PR is:

-#        define PYBIND11_BUILD_ABI "_mscver" PYBIND11_TOSTRING(_MSC_VER)
+#    elif defined(_MSC_VER) && defined(_MT)
+#        define PYBIND11_BUILD_ABI "_mt_mscver" PYBIND11_TOSTRING(_MSC_VER)
+#    elif defined(_MSC_VER) && defined(_MD) && (_MSC_VER >= 1900) && (_MSC_VER < 2000)
+#        define PYBIND11_BUILD_ABI "_md_mscver14"
+#    elif defined(_MSC_VER) && defined(_MD)
+#        define PYBIND11_BUILD_ABI "_md_mscver" PYBIND11_TOSTRING(_MSC_VER)

could this (note the <<<< LOOK HERE)

-#        define PYBIND11_BUILD_ABI "_mscver" PYBIND11_TOSTRING(_MSC_VER)
+#    elif defined(_MSC_VER) && defined(_MT)
+#        define PYBIND11_BUILD_ABI "_mscver" PYBIND11_TOSTRING(_MSC_VER)  <<<< LOOK HERE
+#    elif defined(_MSC_VER) && defined(_MD) && (_MSC_VER >= 1900) && (_MSC_VER < 2000)
+#        define PYBIND11_BUILD_ABI "_md_mscver14"
+#    elif defined(_MSC_VER) && defined(_MD)
+#        define PYBIND11_BUILD_ABI "_md_mscver" PYBIND11_TOSTRING(_MSC_VER)

or this

-#        define PYBIND11_BUILD_ABI "_mscver" PYBIND11_TOSTRING(_MSC_VER)
+#    elif defined(_MSC_VER) && defined(_MT)
+#        define PYBIND11_BUILD_ABI "_mt_mscver" PYBIND11_TOSTRING(_MSC_VER)
+#    elif defined(_MSC_VER) && defined(_MD) && (_MSC_VER >= 1900) && (_MSC_VER < 2000)
+#        define PYBIND11_BUILD_ABI "_md_mscver14"
+#    elif defined(_MSC_VER) && defined(_MD)
+#        define PYBIND11_BUILD_ABI "_mscver" PYBIND11_TOSTRING(_MSC_VER)  <<<< LOOK HERE

be better?

rwgk commented 2 days ago

@henryiii @Skylion007 for visitility (see my other comment posted a minute ago)