integrated-inferences / CausalQueries

Bayesian inference from binary causal models
Other
24 stars 7 forks source link

srr_report #296

Open lilymedina opened 10 months ago

lilymedina commented 10 months ago

This is the report created by srr_report(). It simply lists all the "standards" that the package should comply with in order to submit to rOpenSci. What we would have to do is to comment on each TODO, clarifying how our code addresses those standards. If the code doesn't address the standards, we'd need to make the necessary changes and comment accordingly. The standards listed are based on the categories I arbitrarily chose to run the function: "General" and "Bayesian." For submission to rOpenSci, the package should have an R file named "srr-stats-standards.r" with all these comments.

srr report for CausalQueries

Click here for full text of all standards

Standards with srrstatsTODO tag

Numbers of standards:

R directory

Standards in on line#136 of file R/srr-stats-standards.R:

till-tietz commented 9 months ago

@gerasy1987 @macartan @lilymedina do we want to divide commenting on these up amongst ourselves? Skimming over the requirements it doesn't look like we'll have to make major changes to CQ (except surpress .Rd file generation for internal functions)

till-tietz commented 9 months ago

@lilymedina @macartan @gerasy1987 are we still pursuing this prior to submitting to JSS?

macartan commented 9 months ago

I’d really like to submit soon

Very happy to do any last tasks i can but just not sure what is left

If there’s a list can you divide up and I’ll do my bit

Coverage tests took hardly any time this morning and think it’s ok

On Mon 11. Dec 2023 at 20:30, Till Tietz @.***> wrote:

@lilymedina https://github.com/lilymedina @macartan https://github.com/macartan @gerasy1987 https://github.com/gerasy1987 are we still pursuing this prior to submitting to JSS?

— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/296#issuecomment-1850748995, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57ISSN52JJGC3KS2CYLYI5NPFAVCNFSM6AAAAAA7O2QI66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJQG42DQOJZGU . You are receiving this because you were mentioned.Message ID: @.***>

till-tietz commented 9 months ago

@macartan I'll divide up the list. Should be a quick one to knock out I think.

macartan commented 9 months ago

pinging on this; ready to wrap up

On Tue, Dec 12, 2023 at 12:05 PM Till Tietz @.***> wrote:

@macartan https://github.com/macartan I'll divide up the list. Should be a quick one to knock out I think.

— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/296#issuecomment-1851820279, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57IKFL2SLCPINGLTZA3YJA27VAVCNFSM6AAAAAA7O2QI66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJRHAZDAMRXHE . You are receiving this because you were mentioned.Message ID: @.***>

till-tietz commented 9 months ago

@macartan do you want to tackle the B issues? I'll tackle the G issues. My sense is that answers to these should be extremely brief. From looking over this CQ addresses most/all of these standards; we just have to briefly comment in what way it does so.

macartan commented 9 months ago

Hard to know where to put the answers though; perhaps all in one vignette? and then reference that in the readme?

On Fri, Dec 15, 2023 at 5:33 PM Till Tietz @.***> wrote:

@macartan https://github.com/macartan do you want to tackle the B issues? I'll tackle the G issues. My sense is that answers to these should be extremely brief. From looking over this CQ addresses most/all of these standards; we just have to briefly comment in what way it does so.

— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/296#issuecomment-1858155099, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57LAWVBWNPDGDT7FW5TYJR3W7AVCNFSM6AAAAAA7O2QI66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJYGE2TKMBZHE . You are receiving this because you were mentioned.Message ID: @.***>

till-tietz commented 9 months ago

They want comments + answers in a file named srr-stats-standards.r that should be included in the package

till-tietz commented 9 months ago

kind of weird that they don't want a .md file (will check if it has to be a .R file)

macartan commented 9 months ago

I see; to every item?; there are so many; but for B these are mostly covered by Stan options

I'll make a start for B

On Fri, Dec 15, 2023 at 5:44 PM Till Tietz @.***> wrote:

They apparently want comments + answers in a file named srr-stats-standards.r that should be included in the package

— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/296#issuecomment-1858169036, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57M2HNEEG567QT7PT6DYJR47HAVCNFSM6AAAAAA7O2QI66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJYGE3DSMBTGY . You are receiving this because you were mentioned.Message ID: @.***>

till-tietz commented 9 months ago

I guess. Agreed; it is a extremely comprehensive. My sense is that simply pointing to a place in the package where something is addressed would be sufficient (given the sheer number of standards)

macartan commented 9 months ago

some of these are addressed in our article if we made the article a vignette then we could point to that

speaking f which; I am not seeing where the vignettes are? were they removed? e.g. Getting started • CausalQueries (integrated-inferences.github.io) https://integrated-inferences.github.io/CausalQueries/articles/1-getting-started.html

On Fri, Dec 15, 2023 at 5:56 PM Till Tietz @.***> wrote:

I guess. Agreed; it is a extremely comprehensive. My sense is that simply pointing to a place in the package where something is addressed would be sufficient (given the sheer number of standards)

— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/296#issuecomment-1858183997, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57JTT6JC7PTOSQ4O3VLYJR6LZAVCNFSM6AAAAAA7O2QI66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJYGE4DGOJZG4 . You are receiving this because you were mentioned.Message ID: @.***>

till-tietz commented 9 months ago

I moved the vignettes to a separate branch as we were having issues with CRAN + rcmd checks related to them when we were last pushing to CRAN. A fresh rebuild of the vignette folder structure should do the trick here I'll rebuild the vignettes fresh and push them to main asap.

till-tietz commented 9 months ago

@macartan vignettes are restored and rcmd checks are passing

macartan commented 9 months ago

can you fold into the report:

BS1.0 Bayesian software which uses the term “hyperparameter” should explicitly clarify the meaning of that term in the context of that software.*

The term "hyperparameter" is used and defined in set_priors.R

BS1.1 Descriptions of how to enter data, both in textual form and via code examples. Both of these should consider the simplest cases of single objects representing independent and dependent data, and potentially more complicated cases of multiple independent data inputs.

Examples of how to enter data are provided in update_model.R

BS1.2 Description of how to specify prior distributions, both in textual form describing the general principles of specifying prior distributions, along with more applied descriptions and examples, within:

Examples of how to set priors are given in set_priors.R and in the "getting started" vignette and described textually in the main package README.

BS1.3 Description of all parameters which control the computational process (typically those determining aspects such as numbers and lengths of sampling processes, seeds used to start them, thinning parameters determining post-hoc sampling from simulated values, and convergence criteria). In particular:

rstan is used for all Bayesian analyses and all arguments passed to update_model are passed on to rstan::sampling. Users are pointed to ? rstan::sampling for details

BS1.4 For Bayesian Software which implements or otherwise enables convergence checkers, documentation should explicitly describe and provide examples of use with and without convergence checkers.

rstan is used for all Bayesian analyses and users optionally save information on covergence. Users are pointed to ? rstan::stanfit for details

BS2.1 Bayesian Software should implement pre-processing routines to ensure all input data is dimensionally commensurate, for example by ensuring commensurate lengths of vectors or numbers of rows of tabular inputs.

Inherited from rstan

BS2.10 Issue diagnostic messages when identical seeds are passed to distinct computational chains.

Inherited from rstan

BS2.11 Software which accepts starting values as a vector should provide the parameter with a plural name: for example, “starting_values” and not “starting_value”.*

Inherited from rstan (init options)

BS2.12 Bayesian Software should implement at least one parameter controlling the verbosity of output, defaulting to verbose output of all appropriate messages, warnings, errors, and progress indicators.

Inherited from rstan

BS2.13 Bayesian Software should enable suppression of messages and progress indicators, while retaining verbosity of warnings and errors. This should be tested.

Inherited from rstan (refresh)

BS2.14 Bayesian Software should enable suppression of warnings where appropriate. This should be tested.

Inherited from rstan

BS2.15 Bayesian Software should explicitly enable errors to be caught, and appropriately processed either through conversion to warnings, or otherwise captured in return values. This should be tested.*

Inherited from rstan

BS2.2 Ensure that all appropriate validation and pre-processing of distributional parameters are implemented as distinct pre-processing steps prior to submitting to analytic routines, and especially prior to submitting to multiple parallel computational chains.

Inherited from rstan

BS2.3 Ensure that lengths of vectors of distributional parameters are checked, with no excess values silently discarded (unless such output is explicitly suppressed, as detailed below).

Inherited from rstan

BS2.4 Ensure that lengths of vectors of distributional parameters are commensurate with expected model input (see example immediately below)

Inherited from rstan

BS2.5 Where possible, implement pre-processing checks to validate appropriateness of numeric values submitted for distributional parameters; for example, by ensuring that distributional parameters defining second-order moments such as distributional variance or shape parameters, or any parameters which are logarithmically transformed, are non-negative.*

Inherited from rstan

BS2.6 Check that values for computational parameters lie within plausible ranges.*

Inherited from rstan

BS2.7 Enable starting values to be explicitly controlled via one or more input parameters, including multiple values for software which implements or enables multiple computational “chains.”

Inherited from rstan

BS2.8 Enable results of previous runs to be used as starting points for subsequent runs.*

Inherited from rstan

BS2.9 Ensure each chain is started with a different seed by default.

Inherited from rstan

BS3.0 Explicitly document assumptions made in regard to missing values; for example that data is assumed to contain no missing (NA, Inf) values, and that such values, or entire rows including any such values, will be automatically removed from input data.*

Handling of missing data is described in `update_model.R'

BS3.1 Implement pre-processing routines to diagnose perfect collinearity, and provide appropriate diagnostic messages or warnings

Inherited from rstan

BS3.2 Provide distinct routines for processing perfectly collinear data, potentially bypassing sampling algorithms*

Inherited from rstan

BS4.0 Packages should document sampling algorithms (generally via literary citation, or reference to other software)

Inherited from rstan

BS4.1 Packages should provide explicit comparisons with external samplers which demonstrate intended advantage of implementation (generally via tests, vignettes, or both).*

Inherited from rstan

BS4.3 Implement or otherwise offer at least one type of convergence checker, and provide a documented reference for that implementation.

Inherited from rstan

BS4.4 Enable computations to be stopped on convergence (although not necessarily by default).

Inherited from rstan

BS4.5 Ensure that appropriate mechanisms are provided for models which do not converge.*

Inherited from rstan

BS4.6 Implement tests to confirm that results with convergence checker are statistically equivalent to results from equivalent fixed number of samples without convergence checking.

Inherited from rstan

BS4.7 Where convergence checkers are themselves parametrised, the effects of such parameters should also be tested. For threshold parameters, for example, lower values should result in longer sequence lengths.*

Inherited from rstan

BS5.0 Return values should include starting value(s) or seed(s), including values for each sequence where multiple sequences are included

Inherited from rstan

BS5.1 Return values should include appropriate metadata on types (or classes) and dimensions of input data*

Inherited from rstan

BS5.2 Bayesian Software should either return the input function or prior distributional specification in the return object; or enable direct access to such via additional functions which accept the return object as single argument.*

Inherited from rstan

BS5.3 Bayesian Software should return convergence statistics or equivalent

Inherited from rstan (Rhat)

BS5.4 Where multiple checkers are enabled, Bayesian Software should return details of convergence checker used

Inherited from rstan

BS5.5 Appropriate diagnostic statistics to indicate absence of convergence should either be returned or immediately able to be accessed.*

Inherited from rstan

BS6.0 Software should implement a default print method for return objects

Inherited from rstan

BS6.1 Software should implement a default plot method for return objects

Inherited from rstan

BS6.2 Software should provide and document straightforward abilities to plot sequences of posterior samples, with burn-in periods clearly distinguished

Inherited from rstan

BS6.3 Software should provide and document straightforward abilities to plot posterior distributional estimates*

Inherited from rstan

BS6.4 Software may provide summary methods for return objects

Inherited from rstan

BS6.5 Software may provide abilities to plot both sequences of posterior samples and distributional estimates together in single graphic*

Inherited from rstan

BS7.0 Software should demonstrate and confirm recovery of parametric estimates of a prior distribution

Examples are provided in test_update_model.R

BS7.1 Software should demonstrate and confirm recovery of a prior distribution in the absence of any additional data or information

Examples are provided in test_update_model.R

BS7.2 Software should demonstrate and confirm recovery of a expected posterior distribution given a specified prior and some input data*

Examples are provided in test_update_model.R

BS7.3 Bayesian software should include tests which demonstrate and confirm the scaling of algorithmic efficiency with sizes of input data.*

Tests are provided in benchmarking vignette,

BS7.4 Bayesian software should implement tests which confirm that predicted or fitted values are on (approximately) the same scale as input values.

Only binary data is allowed.

BS7.4a The implications of any assumptions on scales on input objects should be explicitly tested in this context; for example that the scales of inputs which do not have means of zero will not be able to be recovered.

Only binary data is allowed.

On Fri, Dec 15, 2023 at 6:52 PM Till Tietz @.***> wrote:

@macartan https://github.com/macartan vignettes are restored and rcmd checks are passing

— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/296#issuecomment-1858264565, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57N7KHAAWA235HTIQBDYJSE7TAVCNFSM6AAAAAA7O2QI66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJYGI3DINJWGU . You are receiving this because you were mentioned.Message ID: @.***>

till-tietz commented 9 months ago

thank you @macartan ... will do

till-tietz commented 9 months ago

@macartan updating the package website as well

macartan commented 9 months ago

great; I just did a small edit to the getting started vignette in line with the answers to the B items

On Fri, Dec 15, 2023 at 7:03 PM Till Tietz @.***> wrote:

@macartan https://github.com/macartan updating the package website as well

— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/296#issuecomment-1858288200, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57LAO4JDDQPPXGRKN6DYJSGGFAVCNFSM6AAAAAA7O2QI66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJYGI4DQMRQGA . You are receiving this because you were mentioned.Message ID: @.***>

till-tietz commented 9 months ago

@macartan I'll be following this https://stats-devguide.ropensci.org/pkgdev.html#pkgdev-srr for the r-opensci documentation + submission. Will try and knock this out next week (18th-22nd).

macartan commented 9 months ago

Great But please share out the tasks!

On Sun 17. Dec 2023 at 13:11, Till Tietz @.***> wrote:

@macartan https://github.com/macartan I'll be following this https://stats-devguide.ropensci.org/pkgdev.html#pkgdev-srr for the r-opensci documentation + submission. Will try and knock this out next week (18th-22nd).

— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/296#issuecomment-1859153793, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57KRQ2OSCNM4LAAOMVTYJ3ON5AVCNFSM6AAAAAA7O2QI66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGE2TGNZZGM . You are receiving this because you were mentioned.Message ID: @.***>