microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
10.19k stars 1.5k forks source link

First STL Contributors Meetup #921

Closed MahmoudGSaleh closed 4 years ago

MahmoudGSaleh commented 4 years ago

Announcement: 9 months ago, Microsoft open sourced MSVC's implementation of the C++ Standard Library (STL). Since then, with the help of a growing community of contributors, we have implemented various C++20 features, fixed bugs, added testing, code integration infrastructure, and much more. While we have collaborated through issues and PR reviews, we would like to assess the current state of affairs, understand what went well, and what we can do better to streamline our workflow. To that end, MSVC team will be holding a series of meetups that are open to all contributors to share feedback and ideas for improvement.

The first meetup will take place on Wednesday, July 22nd, 2020 from 9am to 10am Pacific Time (PT). We will conduct the discussion via Microsoft Teams. To participate, we ask that you register for the event using this link: https://aka.ms/stl-meetup. During the meeting, we will focus on the high-level workflow of the repo, but we won’t be discussing technical issues such as C++ 20 features, bugs, or specific PRs unless their main purpose is to improve or change the STL repo workflow.

Please feel free to comment on this issue to ask any questions and to propose agenda items you’d like to discuss during the meetup.

miscco commented 4 years ago

Awesome. Looking forward to it!

AlexGuteniev commented 4 years ago

Interesting.

I want to raise the concern about pull requests, where some amount of work has been made, but ultimately they stuck. stop_token and jthread. Optimize random.

What happened, how should it be prevented.

miscco commented 4 years ago

Interesting.

I want to raise the concern about pull requests, where some amount of work has been made, but ultimately they stuck. stop_token and jthread. Optimize random.

What happened, how should it be prevented.

Looks at my libc++ pull requests with month of silence o.O

That said yes, there might be a better way to handle that. What I would really like is the possibility to move a review from "Work in Progress" to "Ready for initial review". I do not think that any more fine grained approach like introducing a "In review" group would really help as it is mostly about priorization.

Is there a way to prioritize the elements in a group?

StephanTLavavej commented 4 years ago

Yep, this is the definitely the sort of thing we'd like to talk about in the meetup. We can comment here too.

Yeah, the pull request backlog weighs on our minds - we've been trying to clear it out, while prioritizing C++20 features. Right now, it isn't really obvious at any given moment what the capacity of the STL maintainer team is for code reviews - this depends on factors like:

For one of the examples mentioned, I've been meaning to review #256 for a while (may as well cross-reference it), but as a performance optimization, it's been lower priority than C++20 work. I managed to clear out all of my planned reviews this week except #703. Hopefully I can review the latter and help it get merged next week - but that still leaves the former, where I need to think for a bit about how to extract code that <charconv> needs. Repeat for months and that's how a PR ends up waiting for review for ages. :crying_cat_face:

Thoughts on how to make our workflow more transparent, and how to more effectively clear out the backlog and keep up with the rate of incoming PRs, are definitely appreciated. I can say for certain that we absolutely love it when contributors thoughtfully review each others' PRs - this allows the maintainers to focus on the most subtle issues (e.g. the ones that involve ABI concerns, unusual compiler interactions, arcane language details, etc.). Getting the initial test suites online was also an important step (thanks @cbezault!); before that, our review capacity was much smaller because maintainers had to write all of the test coverage (remember span tests? :smile_cat:) - during that time, much of our initial backlog built up.

What I would really like is the possibility to move a review from "Work in Progress" to "Ready for initial review".

We're always happy to respond to a comment "I think this is ready for initial review" by clicking the appropriate menu item. Alternatively, it should be possible to craft a GitHub Action that responds to a specific comment (e.g. "/review ready" or something), from only the submitter of a PR, by moving the corresponding card. I looked briefly at the Marketplace but didn't see any actions already doing that. I tend to believe that maintainers should be responsible for moving cards to Final Review (this is saying "the PR looks pretty solid, no obvious major issues") and to Ready To Merge (this is vouching for the absolute correctness of the PR). However, it seems entirely reasonable and desirable for PR authors to be able to move their own PRs to Initial Review, or even back to WIP.

Is there a way to prioritize the elements in a group?

Cards in the Project columns are ordered, but we haven't really been treating their order as significant - initially we tried to sort them in descending order, but that doesn't really persist since moving cards between columns breaks that order. The linear nature of a Milestone would make it easier to prioritize; I don't know if updating both the Code Reviews Project (which I personally find extremely useful) and a PR Milestone (which we don't have yet) would be too much effort, but we could look into it. We could theoretically use Milestones to mark which reviews we intend to get through in a week/month/release/something.

We've been trying to be more rigorous about using assignments to indicate when a maintainer is responsible for the next step (typically, having signed up for a review, or for porting a PR to MSVC). This is no guarantee that it'll be addressed soon, though.

Thanks for your thoughts, and your patience. Now it's 3 AM so I should probably stop writing. :joy_cat: :zzz:

AlexGuteniev commented 4 years ago

Maybe if you are suffering from priority inversion, you should discourage contribution to non-priority. Like, comment on #671 or #889 that "it will be reviewed after C++20 features" instead of processing them right away.

But I think that generally, if you are willing to accept contributions, you should be somehow ready for even more PRs. From myself, there are likely be more C++20 synchronization after atomic wait is merged. Even more with vNext.

Unfortunately I have no ideas of significant improvement. I see can see one micro-optimization here Porting VSO Bugs - this could be short cut, especially for DevCom issues, where some context is available. "A good test case" or "permalinks" could be skipped. Contributors should be able to recover context, if they are indeed able to handle issues. Or ask themselves if something is really missing.

I also would note that some contributions may require your help before the PR is ready for review. "Compare-And-Exchange With Padding Bits" is an extreme example. The PR isn't even there, but it already needs your time. (It was once blocked by compiler support, compiler team added an intrinsic, but I believe that the intrinsic is not a good one, and a better intrinsic should be requested (a set of them actually, if you are targeting for a good ARM support)).

miscco commented 4 years ago
* Whether people are on vacation (e.g. I just took 2 weeks of vacation, but I didn't mention that anywhere).

I do not believe you owe anyone something along those lines. Vacation etc is private business and should stay that way

* Whether people are working on non-STL projects (e.g. @BillyONeal was "on loan" to the vcpkg team for months, reducing our capacity to review multithreading PRs - obviously we need to adapt now that he's permanently switching). We haven't really drawn attention to the fact that some, but not all, of the team spends 100% of their time on the STL.

* Whether people are working on MS-internal tasks (e.g. porting PRs to the MSVC-internal repo consumes time, and so does fixing build breaks in our Real World Code suite; recently I spent several hours fixing warnings that the `std::pow` conformance fix triggered in CRYENGINE).

I think that these two points are essentially impossible to alleviate insofar as no one can effectively predict your productivity regarding what you have on your plate, or how long it takes for you to review etc.

* Whether people are working on C++20 features themselves (e.g. finishing Ranges soon is a priority, as you can see from the flurry of PRs from @CaseyCarter and @ahanamuk - and those PRs need to be reviewed before other PRs).

I think this would be something that could be valuable. Essentially the "Plan of the month" would be something that helps guessing your workload. I personally also have no idea about the VisualStudio release plan so putting that in some form of document would be helpfull as close to the release you will be quite swamped with work (Surprise i guess?)

* Whether people are working on other fixes - e.g. I spent a fair amount of time working on infrastructure like #862 and #906 instead of reviewing PRs - it felt like those were important for the project as a whole, but at the cost of delaying reviews. (I've also created a few cleanup PRs that probably weren't strictly necessary; that's how I fill time between meetings, waiting for test passes, and other times when my brain doesn't feel up to an intensive review.)

I do not know whether this is really something that concerns outside contributors. None of us knows how many bugs there are orhow complex they are, The amount of work required to create some realistic estimation of those work items seems quite large and essentially wasted.

I can say for certain that we absolutely love it when contributors thoughtfully review each others' PRs

I try (mostly because I am often currious about the code changes itself). That said I am still amazed about the level of scrutiny you manage to give every review.

However, it seems entirely reasonable and desirable for PR authors to be able to move their own PRs to Initial Review, or even back to WIP.

I agree that anything beyond that would be unwise. I for my part wouldnt even mark them as reviewed because I do not feel that I have the required expertise.

Generally speaking I believe there are three types of work I consider appropriate for me (or other contributors of my skill level) and which I try to tackle whenever I have some spare time:

  1. The good old "We should clean that up when we have time" issues
  2. General modernization of the infrastructure
  3. Small / medium sized bug fixes/features

What I generally would not see as worthwile are features that are both too large (e.g. std::format) or require changes to the compiler. Those are generally better left for your team.

I would also appreciate if the roadmap would give some coarse timeline for the next half year. Togeher with the release schedule that would greatly help to estimate your workload.

StephanTLavavej commented 4 years ago

@AlexGuteniev

Maybe if you are suffering from priority inversion, you should discourage contribution to non-priority.

We briefly considered discouraging PRs, but felt that it wouldn't be a good way to interact with the community. Part of our migration to GitHub is learning how to work more efficiently. I think it's better for us to stretch to review lots of activity, than the alternative.

But I think that generally, if you are willing to accept contributions, you should be somehow ready for even more PRs.

Yep, we're trying to increase our review capacity.

Unfortunately I have no ideas of significant improvement.

That's okay, just hearing about your concerns is important!

I see can see one micro-optimization here Porting VSO Bugs - this could be short cut, especially for DevCom issues, where some context is available. "A good test case" or "permalinks" could be skipped. Contributors should be able to recover context, if they are indeed able to handle issues. Or ask themselves if something is really missing.

Yeah, that might be reasonable. The list is finite and fairly small ("just" 150 or so), it just hasn't made progress because it's a lower priority task that we've devoted zero cycles to recently. I filed #939 to request community assistance; if this continues to make little progress, taking some shortcuts when porting might be worth it.

I also would note that some contributions may require your help before the PR is ready for review.

Especially because of the latency involved in requesting compiler features, yes.

@miscco

Essentially the "Plan of the month" would be something that helps guessing your workload.

We initially envisioned monthly Iteration Plans, much like VSCode publishes. Perhaps we could update the Roadmap monthly with a short summary of what we plan to do (the Roadmap was drafted for the initial repo release but we haven't used it for our own purposes since then).

I personally also have no idea about the VisualStudio release plan so putting that in some form of document would be helpfull

Understood. We usually can't talk about specific VS release dates, but maybe more information about the VS release process would be useful (I've mentioned it in various issues/PRs, I think, but nothing easily findable).

as close to the release you will be quite swamped with work (Surprise i guess?)

This is, fortunately, not usually the case. Because we maintain an always-production-ready level of quality in the STL, the cutoffs for releases are generally not work-intensive (we don't have to fix a bunch of bugs to get the product into a shippable state). There is a bit of extra work if we want a feature to land in a release, but not massively so.

I try (mostly because I am often currious about the code changes itself). That said I am still amazed about the level of scrutiny you manage to give every review.

:smile_cat: @mnatsuhara has recorded Advice for Reviewing PRs on the wiki, which we'll be expanding over time.

I for my part wouldnt even mark them as reviewed because I do not feel that I have the required expertise.

Note that "Ready For Initial Review", to us, just means that you're done with making any changes, and you'd like a maintainer to look at it. Perhaps the states should be rephrased (too many start with "Ready"), to something like "Initial Review Requested", "Final Review Requested", "Ready To Merge".

I would also appreciate if the roadmap would give some coarse timeline for the next half year.

I added a Tentative Timeline to the Roadmap for H2 2020, and H1 2021. We can probably add more details about what we plan to ship in 16.8.

AlexGuteniev commented 4 years ago

Anyone received anything after filling in the form?

miscco commented 4 years ago

Nope, I was wondering too

cbezault commented 4 years ago

We have both of your names, we'll be sending out invites closer to the meeting date. Maybe we'll change the workflow to send a notification email next time.

SunnyWar commented 4 years ago

I want to add my vote that the pull request backlog is, IMHO, the biggest issue that needs to be addressed. I'm currently not taking part in STL largely because of this issue and I bet I'm not the only one.

MahmoudGSaleh commented 4 years ago

Thanks to everyone who registered. I have sent out the meeting invite.

StephanTLavavej commented 4 years ago

@SunnyWar Since the PR backlog concerns all of us, I'll begin tracking active PRs in the Status Chart. Hopefully this will make it easier to see when we're making progress over time, treading water, or falling behind. (I ported the chart to JavaScript over the weekend, making it much easier to modify and present to the world.)

miscco commented 4 years ago

So what I would also be interested in is your (vclibs) view on the open source journey so far.

On the one hand what are the pain points / successes for the maintainers. For example until @StephanTLavavej's comment regarding reviews I was always feeling like my 3 year old telling me how to speak properly. What are the the things you would like to see more / less of from the community.

On the other hand it would be quite interesting to hear from the "boss-like entities" -end of citation-. Often open-sourcing something is seen as a waste of time with no benefit. What is the organizational perspective on the project?

AlexGuteniev commented 4 years ago

Hopefully this will make it easier to see when we're making progress over time, treading water, or falling behind

Apparently the chart expectedly show that the number of open PR grows. But I would trust more a chart which shows the time PRs are in open state. Not sure if it is max, average, or sum. But for example, if four half a year PRs are closed, and eight opened the same month, it is still a progress.

MahmoudGSaleh commented 4 years ago

So what I would also be interested in is your (vclibs) view on the open source journey so far.

On the one hand what are the pain points / successes for the maintainers. For example until @StephanTLavavej's comment regarding reviews I was always feeling like my 3 year old telling me how to speak properly. What are the the things you would like to see more / less of from the community.

On the other hand it would be quite interesting to hear from the "boss-like entities" -end of citation-. Often open-sourcing something is seen as a waste of time with no benefit. What is the organizational perspective on the project?

@miscco I agree, this is a good topic to include in the meetup. Also, I'd like to add some context. As you mentioned, we needed a "business justification" to convince stakeholders to invest in moving development to open source. I guess a few years ago, that would have been a difficult decision to make, but last year, the stars aligned. We've already been noticing how successful other large C++ open source libraries out there have been and how development in the open gets us closer to the community, which has always been a primary goal for us. Futhermore, as many have noticed, Microsoft has already been focusing on open source for some time; GitHub acquisition is one clear example. Finally, thanks to P.J. Plauger, we cleared some more hurdles that allowed us to move to open source development.

There's also a blogpost that @StephanTLavavej wrote last year talking about the decision to move to open source. And, I talked about the same in my progress update on STL development during the latest Microsoft Virtual C++ Conference.

AlexGuteniev commented 4 years ago

When vNext development starts, there would be more room for design decisions. It means more room for conflicting opinions of maintainers and contributors. Looks like I already have long standing different opinion on one thing #680 , and another is apparently added today https://github.com/microsoft/STL/pull/1029#discussion_r455447143 .

On one hand it is not useful to take opinion of anyone who can create GitHub issue seriously. On the other hand, not being open to other people's views may put limit on serious contributions. So, what would be your opinion on opinions?

StephanTLavavej commented 4 years ago

@AlexGuteniev - For both changes to existing code (correctness, performance, throughput) and designing new code, when there's an implementation decision to be made, we have to evaluate any arguments/evidence in favor of benefits, versus the cost of developing/testing/maintaining the code, and the risk of disrupting users (whether changing something in existing code, or doing something surprising in new code). If the cost and risk seem low, then minimal arguments/evidence are necessary - sometimes just a theoretical justification of why one choice is better. If the cost and risk seem higher, then we'll want to see stronger evidence, like benchmarks. When such evidence isn't available, we generally choose the least costly and least risky option: keeping the status quo for existing code, or doing the simplest/Standard-depicted thing in new code.

We hope that this evidence-guided decision process makes sense; it's not "maintainer opinions win". For the specific examples you've mentioned, I'd want to see benchmarks showing a significant improvement - and if such benchmarks can't be easily crafted, that suggests that any potential improvements are hard to observe, making them not worth the cost and risk. For another example, https://github.com/microsoft/STL/pull/653#issuecomment-610178173 was the data that persuaded us to merge that attempted optimization. I still think that decision was made correctly, although we should have spent more time validating the math. (That is, we took a risk and it ended up needing to be reverted, but I still think we took the risk for valid reasons, even though review missed the bug.)

BillyONeal commented 4 years ago

Looks like I already have long standing different opinion on one thing #680

To clarify: when we last interacted with that item we didn't have any data in either direction, and had arguments both ways on whether to make the change, and had no reasonable way to measure which was correct. The existing product didn't have the pauses, and so as STL indicated above, the status quo wins. Since then you posted a benchmark which probably indicates a change should be made but we didn't notice :/

apparently added today #1029 (comment)

This is a new feature so the 'status quo' might not seem to apply because we don't have code checked in, but we already substantially designed the feature's function; that's why we requested the zeroing intrinsic. The simplicity of ensuring the correctness of the resulting atomic behavior is important and the claims of zeroing on store being ruinously expensive don't appear justified for 99.999% of users of atomics. Even for atomic_ref the only perf example that might matter is in the constructor where most of the time, even for types which have padding to set to zero, the cost is 1 relaxed load and 1 branch in most cases (because in most cases the padding will be already zeroed). And there are very few such types one would want to use in atomic to begin with.

AlexGuteniev commented 4 years ago

The simplicity of ensuring the correctness of the resulting atomic behavior is important

I don't see that the initial design better approaches simplicity, it is rather reverse. It has more places to apply, and atomic_ref constructor would be even more complex than the modified compare_exchange_strong.

Also about correctness. The constructor solutions don't tolerate cases when atomic_ref underlying value is accessed directly. Whereas it is non-Standard, I expect that some users are likely to do this, so it might be worth not breaking.

Ok, I'm taking the 99.99% part (thanks for not having 10-bytes floating point type), and the complexity part (I see CAS loop solution simpler, but if you see it other way around, that's fine, you are maintainers)

I will update the PR

BillyONeal commented 4 years ago

I don't see that the initial design better approaches simplicity, it is rather reverse. It has more places to apply, and atomic_ref constructor would be even more complex than the modified compare_exchange_strong.

It is editing more places but I'm paranoid about the resulting memory ordering guarantee implications of repeating the comparison outside of the intrinsic; it seems like we would need to upgrade everything to at least acquire in such situations? Perhaps we should move this discussion back to the PR.

AlexGuteniev commented 4 years ago

Regading PRs waiting for too long. Do you know how to determine whether a PR is abandoned by its author, and how the work is continued by someone else or discarded? I'd like to hear about it.

BillyONeal commented 4 years ago

I don't think that has happened completely yet. If it's really old we've just been leaving a comment asking if the author is still working on it; if they don't reply in, say, 30 days, we should probably close them.

StephanTLavavej commented 4 years ago

In general, for an abandoned PR (which will surely happen; people get busy, or change jobs, etc.), if the change is desirable and finishing the work is more a matter of "polish some rough edges, add some missing pieces" and not "discard literally everything and rewrite it completely differently", then I think it's better to salvage the PR than to close it unmerged. #804 is an example of a PR that was apparently abandoned, despite fairly prompt maintainer comments requesting testing, and repeated pings. (Again, I totally understand how people get busy and can't reply.) @BillyONeal and I were busy for a while (still busy!) but we've managed to fix it up into a shippable state, and this is a highly desirable performance improvement (as Billy's original vectorization of reverse was a 10x win, IIRC), so I'm glad we didn't discard it.

MahmoudGSaleh commented 4 years ago

Thanks everyone who participated in the meeting. It was great meeting you all virtually, and the discussions were really helpful. Based on the discussions, there were several actions taken and decisions made.

mnatsuhara commented 4 years ago

As an FYI, I've posted my personal notes from the STL Contributors Meetup on 7/22/2020 on the wiki, and they can be found here. I expect to do the same for future meetups!

StephanTLavavej commented 4 years ago

Closing this issue - but if anyone has other topics they'd like to talk about, in addition to the Discord that @MahmoudGSaleh set up and the issues that @mnatsuhara filed, @cbezault has activated the GitHub Discussions beta for our repo - see the new tab above.