marshmallow-code / webargs

A friendly library for parsing HTTP request arguments, with built-in support for popular web frameworks, including Flask, Django, Bottle, Tornado, Pyramid, webapp2, Falcon, and aiohttp.
https://webargs.readthedocs.io/
MIT License
1.38k stars 158 forks source link

webargs==5.5.3 is NOT a minor patch update, it broke our app! #469

Closed captainkw closed 4 years ago

captainkw commented 4 years ago

Our Python server is powered by the webargs library, which my company has been happily using so far. All except TWO backwards incompatible incidents - one of which happened last year, one of which happened today. The 5.5.3 update on 2020/1/28 broke our client apps' ability to make JSON POST requests to our backend, breaking key analytics data:

Screen Shot 2020-01-29 at 8 49 58 PM

https://webargs.readthedocs.io/en/latest/changelog.html

I am outraged. I understand that the 5.5.3 update is to address a vulnerability (CVE-2020-7965: Don’t attempt to parse JSON if request’s content type is mismatched), but this is a potentially backwards compatibility breaking change, and is NOT a minor patch number update. This at the very least should have been a 5.6.0 update instead, and would have alerted our team to do additional QA.

Today's incident took me more than 2 hours to troubleshoot as I initially thought the the problem was in our Python application code, then we ruled that out, and started to look at build issues. We had to scrutinize every Python library that got updated since the last build until we noticed this webargs 5.5.2 to 5.5.3 change. Again, this is NOT a minor patch version update. Today's incident gave me pause on continuing to use webargs in a production, revenue collecting operation.

Please NEVER issue a backwards compatibility breaking change as a minor patch update again, there are production servers that depend on your library for their livelihood.

@sloria @lafrech

sloria commented 4 years ago

I feel your frustration--it's unfortunate you had to go on a dependency dumpster dive for the latest webargs patch. While I can't get you your 2 hours back, I can at least offer my rationale.

We do take backwards compatibility seriously, and we follow semver as closely as is practical. The reality is, however, that "backwards compatibility" is not always as clear cut as we'd like, and this latest patch was a tough judgment call.

My rule of thumb is: if a patch fixes undesirable-but-expected behavior, we increment the middle version number, i.e. it's an enhancement. If the behavior is undesirable and unexpected, the patch should be considered a bug fix. In this case, the behavior was undesirable, unexpected, and insecure. Some of the parsers (TornadoParser and BottleParser IIRC) had the correct behavior. The other parsers were fixed to be consistent with those, which is why I bumped to 5.5.3.

captainkw commented 4 years ago

Thanks for the rapid response @sloria. It was indeed a dependency dumpster dive session last night, that's well put. I think we should discuss future policies and processes on these code pushes so we don't have this happen again. For example, when we upgrade to Webargs 6.x.x, we WILL expect major changes, and thus thoroughly QA everything, make sure client side is sending the right headers, and our backend is receiving all the expected data without errors. However, in this case, since it was a minor bump from 5.5.2, to 5.5.3, we expected everything to work consistently without major breakage, and that a hotfix is safe. Turned out, it was not.

In this case, we believe this update doesn’t constitute as a minor revision. It should have been documented well ahead of time in a more major update to webargs (perhaps only merged to 6.0.0). If you are ever concerned of backwards compatibility of a change it probably shouldn’t be in a minor update. At the very least, we think some sort of deprecation warning should be triggered in the server log for this kind of JSON parser usage (with mismatched content type), some number of weeks before a breaking changes is implemented in the library.

Borrowing an analogy from the GPU gaming world, graphic drivers have a "validation layer", where you can put GPU instructions through a "validation mode", which runs much slower but will spit out warnings for behavior that is unexpected or suboptimal, but that is not causing any serious issues (so not really like a debug build with asserts, more a special mode that does much deeper introspection at runtime). For example, here is D3D12's validation layer: https://docs.microsoft.com/en-us/windows/win32/direct3d12/using-d3d12-debug-layer-gpu-based-validation

Having a similar "validation layer" for the webargs, even a basic implementation, could be tremendously helpful during development on our dev environments, we'd have this mode turned on and see if there are any missing/wrong headers in parsing the API calls from our client apps, and we can catch it ahead of time. In this case, our apps has been working just fine forever, so it went undetected that it was not compliant to CVE-2020-7965. Had we known, it would have been fixed a long time ago.

sloria commented 4 years ago

When addressing undesirable-but-expected behavior that will change in the next major release, issuing deprecation warnings makes sense.

In this case, though, we did not document the change ahead of the next major release because this was a security-related bug fix. Per our security process, the patch was released in coordination with the release of the CVE. The fix was released in 5.x because it is the latest stable line, i.e. we are still supporting it.

I'm sorry you had to deal with a breakage. But keep in mind that other teams were affected by the bug--particularly, those building applications that rely on the fact that JSON requests cannot be sent cross-site. The previous behavior was unexpected and undesirable.

sirosen commented 4 years ago

@sloria, can we close this? It's not a well-formed issue in that there's no way to meaningfully "address it". It's therefore just clutter when looking at what issues there are in webargs which can be worked on for v6.

I would argue that webargs has followed semver correctly in this case, as far as is possible. Semver says to update the patch version for a backwards-compatible bugfix, and that

A bug fix is defined as an internal change that fixes incorrect behavior.

Was this that? Yes, from the perspective of the maintainer. And clearly "No", from the perspective of the issue reporter and those who 👍ed it. That disagreement is unfortunate, but often maintainers have to make a judgement call about whether or not a fix meets this definition.

Any bugfix can be a breaking change for someone. In many cases, fixing bugs which people have worked around will break the applications which have those workarounds. Semver is about communication of intent, it can't be about compatibility with every application which uses the library.

This is well beyond the scope of webargs, but I would recommend looking at poetry or pip-tools (or pipenv, but I strongly dislike it...) to control the exact dependency versions you're rolling out. Or pull packages into your images (containers, disk images, etc) with or without these tools. Either approach will let you more or less trivially roll back changes to your dependency versions.

You should not be assuming that because a library has updated only its patch version that everything will work. You might not like this, but it's the reality of deploying production python apps. The python ecosystem and semver don't offer the kind of guarantees for which this issue (implicitly) asks.

sloria commented 4 years ago

Yes, thanks for that synthesis. Since there are no further actions to do here, I'm closing.

lafrech commented 4 years ago

I just got caught by this myself... I should have known.

I second what is written above. There is no absolute distinction between a breaking and a non-breaking change.

Workflow

The proper way to address this is to pin dependencies and test before deploying.

sloria commented 4 years ago

Yeah. The unfortunate reality is that in rare cases, a fix for some people will be a breaking change for other people, as was the case here.