Established convention for docstrings

jcohenadad commented 4 years ago

This PR establishes conventions for headers of .m files in this project.

Initially, the strategy was to modify genToolboxHelp to modify each .m file to match the formatting needs for Matlab's documentation browser. However, after giving it some thoughts, a better strategy consists in modifying the excellent helpDocMd to accommodate different formatting outputs.

jcohenadad commented 4 years ago

answering @rtopfer's comment:

despite Matlab not being strongly typed, i still think that separating e.g. float, int, str, function, struct, would be extremely helpful. I understand that "imposing" a type might be restrictive (where other types would also work), but it would provide a better understanding of what is expected, from a user standpoint.

rtopfer commented 4 years ago

@jcohenadad for the example at hand, what about

% INPUTS
%
%    orders (uint)
%      Degrees of the desired terms in the series expansion, specified as a
%      vector of non-negative integers (`[0:1:n]` yields harmonics up to n-th
%      order)
%
%    X, Y, Z (numeric)
%      Identically-sized 2- or 3-D arrays of grid coordinates 
%
% OUTPUTS
%
%    basis (double)
%      4-D array of spherical harmonic basis fields

jcohenadad commented 4 years ago

@jcohenadad for the example at hand, what about

% INPUTS
%
%    orders (uint)
%      Degrees of the desired terms in the series expansion, specified as a
%      vector of non-negative integers (`[0:1:n]` yields harmonics up to n-th
%      order)
%
%    X, Y, Z (numeric)
%      Identically-sized 2- or 3-D arrays of grid coordinates 
%
% OUTPUTS
%
%    basis (double)
%      4-D array of spherical harmonic basis fields

great start.

i would still like to find a way to explicit when it's an array or a scalar (i see how confusing this could cause undesired bugs down the line). suggestions:

%    basis (4d double)
%    basis (4xdouble)
%    basis: (double, double, double, double)

i'm not in love with any of my suggestions above though... so any feedback welcome

EDIT: one advantage of encoding this information (about the scalar vs. n-dim array) is that at some point, we will be able to test it

rtopfer commented 4 years ago

We could just copy (or copy an edited form) of the arguments block, e.g.

    arguments
        orders(1,:) {mustBeNumeric,mustBeNonnegative};
        X(:,:,:,1) {mustBeNumeric};
        Y(:,:,:,1) {mustBeNumeric};
        Z(:,:,:,1) {mustBeNumeric};
    end

^ which indicates no more than 3-dims for X,Y,Z. e.g. if you try to input an X array with the 4th dim >1, you get the error message

Value must be an array of size *-by-*-by-*-by-1 where * indicates a dimension of any size.

jcohenadad commented 4 years ago

this: https://github.com/shimming-toolbox/shimming-toolbox/pull/145#issuecomment-652661536 (or a variant) is definitely something we should do. I would suggest the "type" indications to be more user-friendly though. e.g.:

        orders (1, :) {uint}
        X (:, :, :, 1) {float}  % or we could call it numeric if you prefer
        bla {str}  % obviously a string
        pouf {float}  % scalar

p.s. this discussion is a good illustration of the need for such information, because i originally thought that "orders" was a single scalar (not a vector) indicating maximum order, and "X" also a scalar 😬

p.p.s not a big fan of "{}" because it takes longer to write, but i see the need in the example above, to prevent confusion with (:, :, ...)

rtopfer commented 4 years ago

It's not exactly the most readable thing however, just copying lines from the arguments block has the advantage of being

already a standard format
very easy to document if it's just copy-pasting (would also facilitate the doc-parser)

Note that actual format isn't quite that of the above comments: {bracketedTerms()} are actually the validation functions (which are customizable and can be anything) the datatype actually follows the size (in parenthesis), e.g. myArg (1,:) double {mustBeNumeric} = DEFAULT_VALUE

There are some annoying things about the order MATLAB deals with these things though: Notably, MATLAB uses the type-assignment first to convert arguments (when possible)—e.g. in the example above, if you input myArg='this is a char vector and not a double' it will type-cast the char-vector to double first—so the argument will still pass! (so, in that case, you probably wouldn't want to include the "type" within the arguments block at all🙄 ...)

jcohenadad commented 4 years ago

omg, are you kidding me?? well... regardless of this behaviour, i think we should still continue using the "type" feature of the argument (since, as you said, this is standard). Should we consider checking for the type before entering the argument block? That would make the code heavy and difficult to maintain though, so probably not a good idea.

rtopfer commented 4 years ago

Ahh sorry, i overwrote your comment!

omg, are you kidding me??

(i know... seems absurd)

well... regardless of this behaviour, i think we should still continue using the "type" feature of the argument (since, as you said, this is standard). Should we consider checking for the type before entering the argument block? That would make the code heavy and difficult to maintain though, so probably not a good idea.

Matlab won't let you do anything before the arguments block—but, you can just incorporate type-checking into the validation functions—ie. why I wrote this

gab-berest commented 4 years ago

The variable type cast is standard in most of modern programmation language to do checksums or other verification with compiled programming languages. As @rtopfer said, we can put a validation function before the size and type to check if the values will work. We could say it should be done because the cast can be problematic, but the real problem happens when Matlab decides to size change the argument (and it happens) and provoke undesirable functionality. Also, if the validation functions are well named, then it can help to understand the format of the input argument. For example: b {mustBeSize(b,[5,3])} someone can't say that they didn't know the size have to be 5 by 3. Matlab has a good documentation on casting and validation: https://www.mathworks.com/help/matlab/matlab_prog/function-argument-validation-1.html#mw_94b29d60-977b-4cfd-b690-d6b914e03f17

gab-berest commented 4 years ago

It's sure it's not the most pleasing to the eye, but it's true that it is standard and you can try make it more beautiful: MyArg { typeMustBe(MyArg, 'double'), dimMustBe(MyArg, 3) } There's also other standard ways like Doxygen that are used a lot and quite clear when comes to understanding code: % @brief Sets the position of the cursor to the % position (row, col). % % Subsequent calls to putbytes should cause the console % output to begin at the new position. If the cursor is % currently hidden, a call to set_cursor() must not show % the cursor. % % @param pos [ int ] (1,2) The new row for the cursor. % @return ret [string] error message string % function ret = set_cursor(pos);

with this methodology, it is easily parsed with the @ symbols and names. It's another suggestion if not having the same syntax as the argument block is not a problem. But the one in this PR is also quite clear and easy to read.

jcohenadad commented 4 years ago

very interesting, i was not aware of Matlab's Class and Size Conversions. I understand better the purpose (and need for) validation function.

So, moving forward, it is pretty clear to me that we do need those validation functions in the argument block. Everyone agree with this?

Now, when it comes to header documentation, what should we do? Copy/pasting the argument block inside the header sounds horrible (inevitably, some inconsistencies will be introduced). I can think of:

option 1: do not copy, but parse it with HelpDocMd to auto-generate the doc. Cons: within Matlab, "help" will omit the input arguments.
option 2: we duplicate the argument block inside the header, and HelpDocMd has the reponsibility to check for inconsistencies. Given that (currently) the doc is built only during push to master (via our Azure pipeline), errors will only appear after the merge (not during the PR review process). We could also ask our CI testing suite to check for this though...

rtopfer commented 4 years ago

i don't have any particular preference for format/style, but i do think arguments blocks + function validation should be used pretty much invariably (doc. considerations aside, the validation stage is instructive for users when clear error messages are given, and i think it will make testing/debugging multi-step processing pipelines a lot easier)

option 1: do not copy, but parse it with HelpDocMd to auto-generate the doc. Cons: within Matlab, "help" will omit the input arguments...

alternatively, we could alter doc header within the actual .m file, i.e.

(copy the original file before overwriting)
parse the original header comment (noting the starting line-numbers of a) the header comment b) the code)
parse the arguments/arguments block of the code
use fopen/fwrite to rewrite the .m file with the new info, i.e. replacing contents between lines a) & b)

jcohenadad commented 4 years ago

alternatively, we could alter doc header within the actual .m file, i.e.

(copy the original file before overwriting)

parse the original header comment (noting the starting line-numbers of a) the header comment b) the code)

parse the arguments/arguments block of the code

use fopen/fwrite to rewrite the .m file with the new info, i.e. replacing contents between lines a) & b)

but this would be done by azure (ie after we push), therefore would be pushed in a subsequent commit, right?

rtopfer commented 4 years ago

but this would be done by azure (ie after we push), therefore would be pushed in a subsequent commit, right?

could be. Or, whoever is writing the code could just call this format_header routine when their code is ready to be pushed, (and if they try to push without having run the routine already, then azure prints an error requesting they format the doc first)

jcohenadad commented 4 years ago

could be. Or, whoever is writing the code could just call this format_header routine when their code is ready to be pushed, (and if they try to push without having run the routine already, then azure prints an error requesting they format the doc first)

i'm not in favor of this manual intervention on the developer side

gab-berest commented 4 years ago

I don't really know this for github, but for Atlassian BitBucket, you can add a verification script on commits and pushes (for wichever branch you want). Is it possible on github?

jcohenadad commented 4 years ago

i think we should stop the discussion about docstrings/doc/testing on this repository, and save those discussions for shimming-toolbox-py.

shimming-toolbox / shimming-toolbox-matlab

Established convention for docstrings #145