Level of Review Necessary for PRs

SteveBronder commented 5 years ago

In a few PRs the level of testing and and review has been set sort of heuristically. I think it would be nice to have some levels established in a wiki somewhere for what level of testing and review is necessary for new additions to the math library. I'll try making a list below which we can update if people want to discuss and formalize this here.

Comments and Docs
- Standard testing, no new tests
- Anyone can review and approve
Refactors
- Standard testing, no new tests
- Anyone, but preferably someone available via the git-blame of the original code.
- Precision tests (?)
New methods
- Tests that use the AD testing framework for unary functions (what otherwise?)
- Anyone (?)
Performance
- Micro benchmark with google benchmark that can be run from a linux box. ala perf-math
- Precision tests
- Anyone, but preferably someone available via the git-blame of the original code.
Memory Management
- Gold benchmarks for each of the supported compilers
- Dan, Bob, Sean (?)

SteveBronder commented 5 years ago

Maybe this is not necessary, but I think it would be nice to formalize somewhere while leaving some reasonable room for things that happen on a case by case basis

bob-carpenter commented 5 years ago

Thanks for writing this up. I have a few comments, but don't think I can approve such a process.

There's current a wiki on writing new functions in Stan. But it doesn't really link into our [developer process documents](). I think the main hurdle here is that the docs are all over the place physically and it's not clear when you've read them all. I just added a link to the developer process overview, which itself points to a developer process wiki, a coding style wiki, etc. It's hard for all of us, not just beginners, to know when we've collected a complete set of process docs.

Tests that use the AD testing framework for unary functions (what otherwise?)

The new AD testing framework awaiting approval in stan-math also handles binary functions. There's a separate testing framework for probability functions. Other functions have example direct tests. I can add ternary tests and can also add tests for variadic reduction functions when we add those (like sum, log_sum_exp, etc., which should be variadic).

Memory Management

I haven't been keeping up with threading or MPI memory management except at a very high level. When the dust settles, I think it'd be nice to have another arXiv paper that explains what's going on for me and others.

Precision tests

I'm not clear what the process should be for this. I just added a bunch of primitive (int and double input) tests that the Stan math functions mirroring standard library functions like stan::math::exp have to be with 1e-8. I could set that tolerance to anything, but just took the default in the testing framework I set up for values.

SteveBronder commented 5 years ago

There's current a wiki on writing new functions in Stan. But it doesn't really link into our developer process documents. I think the main hurdle here is that the docs are all over the place physically and it's not clear when you've read them all. I just added a link to the developer process overview, which itself points to a developer process wiki, a coding style wiki, etc. It's hard for all of us, not just beginners, to know when we've collected a complete set of process docs.

I think we should try to co-locate all those things together into a contributing.md like tensorflow has

https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md

I'm not clear what the process should be for this. I just added a bunch of primitive (int and double input) tests that the Stan math functions mirroring standard library functions like stan::math::exp have to be with 1e-8. I could set that tolerance to anything, but just took the default in the testing framework I set up for values.

For the OpenCL stuff this would be nice to have when comparing the stan math vs. OpenCL methods

wds15 commented 5 years ago

One point to consider when it comes to performance testing is about generalisation. By that I mean if performance tests always have to be run on all platforms and compilers to make any sense or is posting a performance test which was run on the developer machine sufficient as a default. Given that we do not have good facilities to run these tests in automation on all relevant platforms, the only sensible default here - to me - is to assume that things generalise unless the developer of the PR thinks different or the reviewer is extremely concerned about this matter.

EDIT: In the past a good example where the different setups needed testing was the threading refactor we did this year - but that's really the only case which comes to my mind, but there are agreeably these cases is what I want to say.

rok-cesnovar commented 5 years ago

@seantalts @serban-nicusor-toptal would it be an option to have some "easy" way of manually triggering the performance test suite on Win/Linux and Mac on a PR? Or potentially an extended performance test suite with some additional test models and end-to-end tests with threading on the warfarin model etc? Something in the lines of what Stan math already had with "Jenkins, performance test please". It wouldnt be so common tho.

I think we were discussing something like this at Stancon right @wds15 @SteveBronder ?

This would decrease the burden on developers/reviewers for PRs like the TLS one. It would not help with the lgamma PR unless we add a performance test on a model that is heavily dependent on lgamma.

For function level performance PRs that show no performance change in any of the test suite models I think tests from the developers machine are sufficient if there is no reasonable doubt that things behave differently on different OSes. Reasonable doubt is off course open for interpretation. For lgamma it was pretty clear that things are weird across the board.

Bottom line a good performance test suite that could be run semi-automatically would be a huge win.

serban-nicusor-toptal commented 5 years ago

@rok-cesnovar I think we already handled that in here: https://discourse.mc-stan.org/t/performance-tests-reports/9287/8 I have created a separate job in Jenkins ( here ) that lets you run the performance tests on a specific pr, with a custom make_local. I think this is what you're looking for. Tho at the moment something went wrong with the job and I'll have to fix it after I'm done with the investigation of why performance master failed in the first place.

rok-cesnovar commented 5 years ago

Oh, I missed this, sorry. This is perfect!!

serban-nicusor-toptal commented 5 years ago

Cool! I'll keep you in the loop and when I figure out what's wrong with it will notify you.

SteveBronder commented 5 years ago

This is excellent @serban-nicusor-toptal thanks you!

With @bob-carpenter s comments as well about coalescing contributor docs I think we should try to cleanup the readme and add a contributor.md doc detailing these sorts of things

seantalts commented 5 years ago

Agreed Steve :)

Traditionally stuff like this has been set repo-by-repo by the tech lead for each repo (even if just by convention rather than explicitly spelled out in some cases). But I think one hope for the TWG Director position was a distillation and unification of code review guidelines across repos. I could take a crack at some minimums there, but I think Math has had a much higher bar than that in the past year or two, so I don't really think that doc would be helpful here.

@syclik as Math lead created this wiki page some time ago to answer exactly the question in this issue, but I think it needs to be updated as it mentions nothing about the escalating benchmarking requirements that have been asked of some recent pull requests. I agree that formalizing those requirements is best and that for any formalized testing requirements like that we need to have infrastructure in place to automate it for our devs before we can reasonably expect PRs to pass it.

syclik commented 5 years ago

Thanks, @SteveBronder. What you wrote is good. And any effort to consolidate this is appreciated.

There are a few other dimensions that weren’t captured in the first list:

design changes
side effects
maintainability

These have come up recently.

The doc that @seantalts linked to still stands. I tried to write it from the opposite view: what are the conditions that need to hold (rather than what bar does a PR have to cross and who do I need to check with). The invariant that must hold is that the code base is stable on a merge. “Stable” can be clarified and debated, but I think that’s what we’re looking for. Since we have so many people working and depending on this, the working definition of stable has been pretty conservative. A PR that doesn’t affect other parts of the code should be really easy to get in. This is stuff like new features (when there isn’t a design change), doc, refractors. Things that should be very difficult is anything where it can disrupt other developers and users. That includes memory management (this affects other devs), design changes (devs), performance degradations (users).

I think the biggest thing we don’t have is a principled way to decide on trade offs. I’m not sure that we’ll be able to capture a set of comprehensive rules, but we could try to put up guidelines. The problem is that there are no real metrics to measure the impact of software architecture or side-effects to precision. That doesn’t mean we shouldn’t try to specify things.

Re: performance. We still haven’t spent a lot of time in performance engineering relative to building new features. We should work on it a bit. Yes, the burden is high, but we’ve only really had less than a handful of PRs that we’ve had to really track OS configurations separately. They’ve been big issues, but the number has been small. I don’t think we could have prepared a test suite that would have worked for our use cases there prior to having done the work by hand. Now that we have, I think we can start building tools to make it easier.

syclik commented 5 years ago

A quick clarification: that doc was not put up as me in the role of “Math lead.” What is written there is a living document that could change as we see fit.

bob-carpenter commented 5 years ago

If you're going to enforce the standards as written as math lead, then I don't think it matters what hat you were wearing when you wrote them.

syclik commented 5 years ago

If you're going to enforce the standards as written as math lead, then I don't think it matters what hat you were wearing when you wrote them.

I wanted to make it clear that these standards aren't top-down; if they were created that way, I'm explicitly saying that at the moment they are not and are meant to be discussed and changed as we, the community of Math developers, see fit.

For historical context, most of the policies we have were reactions to problems and we've added them piece by piece. None of this was done by force and they were discussed with the community prior to them becoming policy.

Unit tests were there at the very start (back when we were using SVN on Google Code), but we didn't enforce testing until it started affecting other developers.
We didn't have distribution tests until we ran into problems where changes to other parts of math functions tanked computations for the distributions (we should rethink that now).
We added cpplint for style and didn't fully adopt it for a year after we implemented it so that we could have a general policy for style
We added clang-format to cut down on style discussion even more so we could focus on substantive changes.

We're trying to come up with a standard so it's less confusing for developers to understand what's required. The reason we've adopted "every function gets a test" is because that's an easy policy to check. If "some functions get tests" then the reviewer has to pick and choose what functions are test-worthy. (Truthfully, we've been burned by internal functions that don't have tests more than a handful of times, so it's one of those policies that's probably good anyway.)

The difficult judgment calls are in:

side-effects. When I'm reviewing, I check for these; these tie together the project in ways that are really difficult to detangle later. If there are 0 side-effects, I'm really happy to let it in the code base.
performance tradeoffs. To me, this falls under side-effects. We don't have a policy here and I'd like us to discuss it.

And then there's a question of whether we maintain common standards across Stan. @seantalts: thoughts?

bob-carpenter commented 5 years ago

Here's some bottom-up feedback: I'm not in favor of the test-every-function policy and would prefer to roll it back. I just want to test the external functions that are part of the API. The result of this policy is that I no longer break out helper functions because it's too much work to write boundary condition unit tests and doc for them.

Where did the test-every-function policy come from?

How do you propose resolving bottom-up requests for policy changes?

syclik commented 5 years ago

Thanks for the feedback.

Where did the test-every-function policy come from?

It came from at least a few things

coming up with a policy that could be enforced consistently. Prior to this policy, the reviewer had to actively decide what needed to be tested and not, what constituted as the public / private boundary, and we had contributors insisting that their code was tested enough.
bugs. Prior to testing non-external functions, we had a lot of problems with bugs in internal functions. Some of these even made it out to Stan users and caused seg-faults due to code never being instantiated (even once) by the developer or the reviewer and the first time it's tested was out in the wild.
maintenance and technical debt. Having untested code that expects certain behaviors to hold is adding technical deb and a large maintenance burden. We've dealt with this time and time again. Without tests for these functions, we can't really refactor them sensibly or reuse them for a different purpose and expand on their existing use because it's unclear what the expected boundaries are for these functions. When we have these tests in place, we can move forward in replacing implementations. Also, knowing how to instantiate functions cuts down the time for future developers to debug.

I understand that it's work to test any function. But it's also work to maintain a function and it's actually a lot more work to try to maintain a function that doesn't have any test scaffolding around it. (It's a lot easier for the dev writing code to instantiate code at that time than it is for the same dev to figure out how to do it months from when it's written.)

Do you remember us having these discussions? If not, I can try to dig up where we talked about it on the dev list from Google Groups.

How do you propose resolving bottom-up requests for policy changes?

Good question.

I think the first thing to do is to clarify what your proposed policy change is. In broad strokes it's "I just want to test the external functions that are part of the API," but we need a little more clarity:

how does a developer or reviewer determine what is included in the external functions that are part of the API? (Especially since the API boundary can change as more things are exposed in Stan.)
if that's all is tested, what level of testing is expected of the developer and what is expected from the reviewer to check? (Right now, it's instantiation of the function, at least one good test case, and edge conditions for every function. Do we expect the developer to white-box test this and hit all the internal boundaries? Do we expect the reviewer to make sure the boundaries of the internal functions it relies on are also tested?)
if a developer submits a PR with almost all the complexity in non-external functions and doesn't test enough, how do we write a policy that makes it acceptable for a reviewer to request more tests?

I think there might be a little more nuance to think through, but once we have something concrete, either generated by one person or by the group, we can take a vote from the Math developers and change the policy. We can do that over discourse, announce the results over discourse, and update the wiki to reflect the change.

bob-carpenter commented 5 years ago

Yes, I remember a lot of discussions around this. The resulting policy isn't where I'd want to draw the line. More inline if you want to start rehashing these discussions.

On Sep 5, 2019, at 4:43 PM, Daniel Lee notifications@github.com wrote:

Thanks for the feedback.

Where did the test-every-function policy come from?

It came from at least a few things

• coming up with a policy that could be enforced consistently. Prior to this policy, the reviewer had to actively decide what needed to be tested and not, what constituted as the public / private boundary, and we had contributors insisting that their code was tested enough.

That's why I'm suggesting ::internal vs. not, where the idea is that internal functions need to truly be internal and not called elsewhere.

• bugs. Prior to testing non-external functions, we had a lot of problems with bugs in internal functions. Some of these even made it out to Stan users and caused seg-faults due to code never being instantiated (even once) by the developer or the reviewer and the first time it's tested was out in the wild.

Doesn't that imply the non-internal function was improperly tested? So much of our code is tested very weakly at the interfaces that I'd rather concentrate on that with our limited effort.

• maintenance and technical debt. Having untested code that expects certain behaviors to hold is adding technical deb and a large maintenance burden. We've dealt with this time and time again. Without tests for these functions, we can't really refactor them sensibly or reuse them for a different purpose and expand on their existing use because it's unclear what the expected boundaries are for these functions.

At the point other functions start using these functions, they're no longer what I'd consider internal.

Do you remember us having these discussions? If not, I can try to dig up where we talked about it on the dev list from Google Groups.

Of course---they were contentious and went on forever and were one of the first places where a tech lead had to assert authority to make a decision unpopular with at least one dev (I don't want to speak for others here).

As I said, I'm OK going along with this decision. I'll continue to use big function bodies and avoid external functions. That neatly solves the problem of users calling internal functions, because there won't be any. Not my favorite way to write code, but I'm OK going along with tech lead decisions. The counter-response to my behavior would be to require smaller code blocks.

How do you propose resolving bottom-up requests for policy changes?

Good question.

I think the first thing to do is to clarify what your proposed policy change is. In broad strokes it's "I just want to test the external functions that are part of the API," but we need a little more clarity:

• how does a developer or reviewer determine what is included in the external functions that are part of the API? (Especially since the API boundary can change as more things are exposed in Stan.)

• if that's all is tested, what level of testing is expected of the developer and what is expected from the reviewer to check? (Right now, it's instantiation of the function, at least one good test case, and edge conditions for every function. Do we expect the developer to white-box test this and hit all the internal boundaries? Do we expect the reviewer to make sure the boundaries of the internal functions it relies on are also tested?)

• if a developer submits a PR with almost all the complexity in non-external functions and doesn't test enough, how do we write a policy that makes it acceptable for a reviewer to request more tests?

I think there might be a little more nuance to think through, but once we have something concrete, either generated by one person or by the group, we can take a vote from the Math developers and change the policy. We can do that over discourse, announce the results over discourse, and update the wiki to reflect the change.

I propose that have an ::internal namespace in files where we can put untested helper functions. Those internal functions should not be called by functions in other files. As soon as other functions call them, they're no longer internal. So this may hinge on our having different notions of "internal". To me, it's internal to a function definition, not about what gets exposed to users in Stan.

I think we should enforce boundary case testing. Way too much of our code has no boundary conditions tested (for instance, no 1 x 1 test for inverse(), no NaN or infinite input checks, etc.). Very few of our external math functions (the ones called from Stan programs) are tested for boundary conditions. For example, I was reviewing a recent change to inverse() where there were no boundary condition tests at all for reverse mode, just a 2 x 2 case and a bigger x tested with sum(inverse(inverse(x)). No tests for 0 x 0 or 1 x 1 matrices, no tests for NaN or inf inputs, etc. Not even a test that a non-square matrix will throw.

This is not unusual. Very few our autodiff functions have been tested for boundary conditions even for reverse mode (NaN, infinities, etc.), boundary size matrices or vectors, etc.

betanalpha commented 5 years ago

I propose that have an ::internal namespace in files where we can put untested helper functions. Those internal functions should not be called by functions in other files. As soon as other functions call them, they're no longer internal. So this may hinge on our having different notions of "internal". To me, it's internal to a function definition, not about what gets exposed to users in Stan.

Why not just allow anonymous namespaces within a file implementing a function for helper functions aimed solely for that function and nothing external?

rok-cesnovar commented 5 years ago

We opted for the use of internal:: as opposed to anonymous a while ago: https://github.com/stan-dev/math/issues/1006

I think calling something with internal:: makes it easier to digest what is an API exposed-function and what isnt. Are there any benefits of using anonymous namespace?

If you want to use an internal:: function in other files you can put it in stan::math:: and add tests.

wds15 commented 5 years ago

When you use an anonymous namespace then you cannot use that function in another file.

It would be a safety measure to prevent other people from using the internal function in other contexts, because they would have to pull them out of the anonymous namespace.

So the upside is the within-file restriction (which is at the same time a down-side because you can't use utilities in other files).

At least this is how I understood the anonymous namespaces - it would make sense to allow them for utility functions, I think... and we use them by now to initialise the AD stack...

betanalpha commented 5 years ago

Just that internal might cause problems when including files. The scope of the anonymous namespaces would be limited solely to the file in which they’re defined, right?

On Sep 6, 2019, at 8:51 AM, Rok Češnovar notifications@github.com wrote:

We opted for the use of internal:: as opposed to anonymous a while ago: #1006 https://github.com/stan-dev/math/issues/1006 I think calling something with internal:: makes it easier to digest what is an API exposed-function and what isnt. Are there any benefits of using anonymous namespace?

If you want to use an internal:: function in other files you can put it in stan::math:: and add tests.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/1321?email_source=notifications&email_token=AALU3FXF4VRFRFLDKZ53QALQIH4W3A5CNFSM4IPOY6CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6B5SVI#issuecomment-528734549, or mute the thread https://github.com/notifications/unsubscribe-auth/AALU3FXU4WSMQPXXNCOOF63QIH4W3ANCNFSM4IPOY6CA.

wds15 commented 5 years ago

I thought so, yes...but reading

https://en.cppreference.com/w/cpp/language/namespace

does not enlighten me... I would have to try that.

rok-cesnovar commented 5 years ago

Thats what we thought but as @seantalts wrote:

An anonymous namespace in C++ makes its contents global at the translation unit level (though it ensures that their symbols won't conflict with those of any other anonymous namespaces from other TUs at link time). I think it might be safe to say we have no use for these in Stan Math, and that likely the existing uses were thinking that they were file-local instead of TU-local.

betanalpha commented 5 years ago

Ah. Apologies for my misunderstanding!

On Sep 6, 2019, at 12:15 PM, Rok Češnovar notifications@github.com wrote:

Thats what we thought but as @seantalts https://github.com/seantalts wrote:

An anonymous namespace in C++ makes its contents global at the translation unit level (though it ensures that their symbols won't conflict with those of any other anonymous namespaces from other TUs at link time). I think it might be safe to say we have no use for these in Stan Math, and that likely the existing uses were thinking that they were file-local instead of TU-local.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/1321?email_source=notifications&email_token=AALU3FXMT6GG6X263ZN545DQIIUSVA5CNFSM4IPOY6CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6CNGLQ#issuecomment-528798510, or mute the thread https://github.com/notifications/unsubscribe-auth/AALU3FRU77Q75DKLNF5ZMFTQIIUSVANCNFSM4IPOY6CA.

betanalpha commented 5 years ago

So why not just a policy where a dev can take advantage of file-specific internal namespaces for local helper functions, with no testing burden, if they’d like? In other words when implementing my_function.hpp I could define helper functions in namespace::internal_my_function which would be encapsulated and not clash with any other internal functions.

On Sep 6, 2019, at 12:25 PM, Michael Betancourt betanalpha@gmail.com wrote:

Ah. Apologies for my misunderstanding!

On Sep 6, 2019, at 12:15 PM, Rok Češnovar <notifications@github.com mailto:notifications@github.com> wrote:

Thats what we thought but as @seantalts https://github.com/seantalts wrote:

An anonymous namespace in C++ makes its contents global at the translation unit level (though it ensures that their symbols won't conflict with those of any other anonymous namespaces from other TUs at link time). I think it might be safe to say we have no use for these in Stan Math, and that likely the existing uses were thinking that they were file-local instead of TU-local.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/1321?email_source=notifications&email_token=AALU3FXMT6GG6X263ZN545DQIIUSVA5CNFSM4IPOY6CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6CNGLQ#issuecomment-528798510, or mute the thread https://github.com/notifications/unsubscribe-auth/AALU3FRU77Q75DKLNF5ZMFTQIIUSVANCNFSM4IPOY6CA.

rok-cesnovar commented 5 years ago

I agree with this. If a developer wrote it as a big function we would expect to have that function tested. If its the same function split up, it makes no sense to demand more tests.

syclik commented 5 years ago

From @bob-carpenter: Doesn't that imply the non-internal function was improperly tested? So much of our code is tested very weakly at the interfaces that I'd rather concentrate on that with our limited effort.

Yes! Exactly. The question here is what do we for policy. It is the external boundary that we really care about, but we haven't asked all code to test all boundary conditions (and I think that's going too far). What do you suggest the policy be?

In the case where the non-external functions are unit tested themselves, we cut down on the combinatorial testing required at the external level. And at least it's a consistent policy. There's still decisions to be made about whether a function could be split or not, but it doesn't add an additional question about what needs to be tested.

From @bob-carpenter: As I said, I'm OK going along with this decision. I'll continue to use big function bodies and avoid external functions. That neatly solves the problem of users calling internal functions, because there won't be any. Not my favorite way to write code, but I'm OK going along with tech lead decisions. The counter-response to my behavior would be to require smaller code blocks.

I am also ok going along with a different policy. It's a trade-off between:

a consistent policy that can be applied for all PRs
our risk tolerance for changes
our value on the future cost of maintenance of the codebase

If we value the technical debt less, then it makes sense to allow for a lot of internal functions. Speaking from experience on this project, we've avoided a lot of duplicated code because we have tested internal functions, but this doesn't have to be the policy going forward. One place where this is coming up right now is @SteveBronder's work on the metaprograms. Because we have doced and tested meta programs, we have a chance to refactor the code (with @rok-cesnovar's help). We had enough trouble deduplicating code to get to that point. If this isn't even a goal, I think it would actually slow down development on the project (as a whole... it'd be faster to develop individual pieces, but that would slow down considerably just later on).

From @betanalpha: In other words when implementing my_function.hpp I could define helper functions in namespace::internal_my_function which would be encapsulated and not clash with any other internal functions.

This is one reasonable way to do it, but what's the policy on testing the external function? Do we require that it tests all the boundary conditions of the internal function through the external interface?

From @rok-cesnovar: If a developer wrote it as a big function we would expect to have that function tested. If its the same function split up, it makes no sense to demand more tests.

I realize this is what is perceived, but this is not quite correct. When this is done correctly, we have more functions to test, but less tests overall.

When the internal function is tested at the boundaries, you can trust that the results or that the errors propagate properly and when you test the external function, you just need to test the boundaries of the external function (assuming that the internal one is ok).

If the internal function is not tested, then we need the external function to push each of the boundary conditions of the internal function from the outside. This is the same as if there's one large function. Both of these require more of a white-box testing approach, requires more effort on the reviewer to make sure this is done correctly, and more effort on the developer. Yes, the perception is that this is faster and easier, but it's actually a lot harder to think through the edge cases this way.

That said, we can adopt a different policy and I'm ok supporting it. We just need to be clear about what is expected.

seantalts commented 5 years ago

I think we have a policy around internal functions already: https://github.com/stan-dev/stan/wiki/Code-Quality

seantalts commented 5 years ago

This is one reasonable way to do it, but what's the policy on testing the external function? Do we require that it tests all the boundary conditions of the internal function through the external interface?

I don't think this is a policy issue, but general testing philosophy in software engineering is that you should test each function according to the user API for that function, and you want to come up with good test cases that try to at least execute all the lines of code under test. Beyond that, I'm not sure you can come up with a scientific one-size-fits-all answer to this age old question.

bob-carpenter commented 5 years ago

On Sep 6, 2019, at 12:03 PM, wds15 notifications@github.com wrote:

When you use an anonymous namespace then you cannot use that function in another file.

This winds up being at the translation unit level, not file level, which isn't so useful for us.

bob-carpenter commented 5 years ago

... general testing philosophy in software engineering is that you should test each function according to the user API for that function, and you want to come up with good test cases that try to at least execute all the lines of code under test.

+1 to that.

Beyond that, I'm not sure you can come up with a scientific one-size-fits-all answer to this age old question.

That's always been my answer.

bob-carpenter commented 5 years ago

Yes, I think all code should test all boundary conditions in the external interface, not the internal interface. The internal interface may never be used at the boundary conditions because we control programatically how it's called.

I think we should be aiming for 100% code coverage in our tests, too. That means you have to know the internals to write the test (like where log1p switches to second-order Taylor series and then when it switches to first order). If the internal functions are themselves complicated, then test coverage on the outside will make sure they're tested or the dead branches removed.

As to reusing functions, I'm OK cutting and pasting once, but the second time I'd like to see code factored out into reusable functions.

I think it's great that devs are factoring out generally useful code. But I don't see anything in the test-every-internal-function philosophy that's going to avoid duplication, and I've already said I want to see functions that are used by a lot of other functions tested.

syclik commented 5 years ago

That seems sensible. Let's word that into a policy that can be followed by developers coding and reviewing and ratify it?

Hopefully with a policy like this in place, we can just point to the policy and developers know when they are lacking in tests and that this level of testing is expected.

As you pointed out, this does not do anything for code-duplication (neither does the existing policy). And it will allow for more volume of untested code inside the codebase which would make maintenance and refactoring more difficult in the long run. But, if we're ok with this policy, then we should go with it.

Next steps:

@bob-carpenter, can you write this up as a policy?
We all can review it and comment on it.
I can set up some sort of voting mechanism for the Math devs. We'll take a vote.

bob-carpenter commented 5 years ago

Yes, I'll try to formulate a precise policy in the next couple of days. Do you want me to post on discourse?

Bob

On Sep 9, 2019, at 2:26 PM, Daniel Lee notifications@github.com wrote:

That seems sensible. Let's word that into a policy that can be followed by developers coding and reviewing and ratify it?

Hopefully with a policy like this in place, we can just point to the policy and developers know when they are lacking in tests and that this level of testing is expected.

As you pointed out, this does not do anything for code-duplication (neither does the existing policy). And it will allow for more volume of untested code inside the codebase which would make maintenance and refactoring more difficult in the long run. But, if we're ok with this policy, then we should go with it.

Next steps:

• @bob-carpenter, can you write this up as a policy? • We all can review it and comment on it. • I can set up some sort of voting mechanism for the Math devs. We'll take a vote. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

syclik commented 5 years ago

Thanks! I think Discourse makes sense tagged with "Developers" and "Math." @SteveBronder, does that work for you?

I was thinking about who would need to be involved in ratifying this change. I know most of us are already active and have commented here, but to make it really official, I will go through the Stan Developers list and select the members that have contributed code to the Math repo. And we'll take a majority of those that vote.

Any objections to that procedure? When we put down the policy on a wiki, we can indicate that so many people voted for / against it and what date.

SteveBronder commented 5 years ago

Works for me! I think discourse is a good place for discussion then once everything is sorted I can take that and the other docs we have and make a contributing.md that holds everything

syclik commented 5 years ago

Beautiful!

Beyond that change, what other things do we need to consolidate? I know where a bunch of things are scattered and it's worth taking the time to organize (my time). Is contributing a good spot for this stuff?

On Tue, Sep 10, 2019 at 11:58 AM Steve Bronder notifications@github.com wrote:

Works for me! I think discourse is a good place for discussion then once everything is sorted I can take that and the other docs we have and make a contributing.md that holds everything

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/1321?email_source=notifications&email_token=AADH6FZZFQKGMPA5O2J43WTQI67Y5A5CNFSM4IPOY6CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6LTR4I#issuecomment-530004209, or mute the thread https://github.com/notifications/unsubscribe-auth/AADH6FZIH35XDSD2M57ZPZDQI67Y5ANCNFSM4IPOY6CA .

SteveBronder commented 5 years ago

LoL it looks like we have a contributing.md but it's hidden in the .github folder

https://github.com/stan-dev/math/blob/develop/.github/CONTRIBUTING.md

And in other link to the wiki is a link to a v nice doc from the stan repo

https://github.com/stan-dev/stan/wiki/Introduction-to-Stan-for-New-Developers

So tbh I think a lot of this is just adding the contributing.md to the main repo and linking to it in the readme. Trying to think about what there is redundant and can be linked to from other things (i.e. the issue and PR templates) A few suggestions below that I think would help tidy stuff up

Move the contributing.md out of .github to the main repo and add a link to it in the readme.md
Add Bob's write up to code-reviews.
A section on setting up the build environment and tests? I think essentially telling people "These are the versions of things we expect you to have" like for clang-format you need version 5.0. We already have this for
Who are the, what I'm going to refer to as 'mini-czars', that should be tagged for review. i.e. for OpenCL stuff Rok or I would be tagged for review. CPU parallelization would tag @wds15 for review, etc. This would be an expansing on the Who can Review A Pull Request section.
Outline at the top so people can jump to where they need
Move the Running Tests section to the contributing.md (or just link to it)
Link to faq @syclik it may be a good idea to expand on some of the things at the bottom of that item.
Let's remove the stuff in Issues section. I think we can just link to a new issue template and let people know there is a set of checklists they have to follow there.
Setup a code of conduct

We do have quite a lot of stuff in the wiki! In my mind space I'd like the contributing.md to be both 'how do I setup the stuff to make a pull request' and 'what should I expect when I make a pull request.' Essentially a condensed version of the developer doc in the wiki. It seems like we have a lot of that stuff in the wiki! Think we just need to make it more front and center

If we are moving over to the monorepo soon would it be better to wait and collate this stuff there? (Bob should still post on discourse so that discussion is sorted by then)

bob-carpenter commented 5 years ago

I outlined a very simple policy here, which is what I'm looking for in reviewing PRs:

https://discourse.mc-stan.org/t/level-of-review-necessary-for-math-prs/11041

syclik commented 4 years ago

I wanted to summarize where we are. From the same thread, we've come to the same spot we were before: https://discourse.mc-stan.org/t/level-of-review-necessary-for-math-prs/11041/17.

The PR policy we have had:

all functions must be tested
all functions must be documented.

This includes functions that are in the internal namespace.

There is wiggle room for interpretation in "all functions must be tested" from the reviewer's point of view. See the discussion for more details.

At this point, I think the next step is to make the policy more prominent and clear and close this issue once done. I think it's useful to include the type of doc at the top to help make it clear what's expected.

stan-dev / math

Level of Review Necessary for PRs #1321