nf-core / hic

Analysis of Chromosome Conformation Capture data (Hi-C)
https://nf-co.re/hic
MIT License
90 stars 55 forks source link

parameters format #39

Closed nservant closed 4 years ago

nservant commented 5 years ago

Thanks! It's one of those unwritten rules that we've discussed and implemented to various degrees in some of the main pipelines but haven't had the time to officially document :sweat_smile: You could change the other parameters too? If it helps, if things do change then I'll have to update numerous pipelines! I've also updated the template today to change the last remaining parameter to this format... https://github.com/nf-core/tools/pull/425/files#diff-4ac32a78649ca5bdd8e0ba38b7006a1eR13

Originally posted by @drpatelh in https://github.com/nf-core/hic/pull/37

nservant commented 5 years ago

@drpatelh , even in the ChIP-seq or ATAC-seq projects, all parameters do not have the same format. Should I wait a bit before changing all my parameters :) ?

drpatelh commented 5 years ago

They should be! The only parameter is maxMultiqcEmailFileSize which I have now updated in the template (see https://github.com/nf-core/tools/pull/425). Which ones have you spotted?

nservant commented 5 years ago

What about --bwa_index, --gene_bed, --tss_bed, --macs_gsize, --mito_name, etc. ?

drpatelh commented 5 years ago

Ah, sorry. Maybe I didn't explain properly. Any parameters that require a user input e.g. (file path, integer etc) are in snake_case format and any that are boolean switches are in camelCase format. Maybe we can use this thread for discussion.

@ewels @apeltzer What do you think about this? This is something we initially discussed way back in 2 years ago Barcelona at the NF conference. Quite a few pipelines are now adopting this approach e.g. rnaseq (mostly I think), chipseq, smrnaseq and a few others. I have also change the template yesterday for consistency. If you both agree we can create an issue on the website to add some docs. Mainly so it's not lost in the ether again :sweat_smile:

nservant commented 5 years ago

Ah !! ok, I didn't get it. This is up to you. I will update according to your best practice. If I can just make a comment, my feeling is that choosing a single format for all variables is much more convenient (and easy to follow for new developers).

ewels commented 5 years ago

My feeling is also that it would be simpler to just have one and stick to it (my preference would be snake_case). I know we discussed this before, I don't remember why we wanted both..

drpatelh commented 5 years ago

Its easier to distinguish parameter type? Personally, I think its easier to work out what a parameter is doing i.e. does it require some sort of input or is it just a switch. I think you probably done this subconsciously for the NGI pipelines because they had parameters like saveAlignedIntermediates, saveReference, singleEnd that were boolean and then others that werent e.g. bwa_index.

ewels commented 5 years ago

But surely users should know what parameters do before using them..? 😉 And we should have parameter validation pretty soon. And personally I find it harder to remember which parameters use which style.

I think that the discrepancy in the old pipelines was more down to who wrote the code - me or @Hammarn 😉

drpatelh commented 5 years ago

So the beauty of having the two styles is that you can pretty much figure out what a parameter is doing before you know what it's doing or even before using it!

Ok. Lets put it to the test! Probably not the best example because you know what most of these parameters are doing already (:wink:). If you had to look through them without knowing anything about what they are doing doesn't it help to figure out which ones require an input and which ones don't? To add to that the case distinction allows the boolean ones to be pretty much self-explanatory. Just gives some visual partitioning to a massive parameter set that would otherwise all look the same :smile_cat:

https://github.com/drpatelh/chipseq/blob/b2fdfd82b76262d57478a81913298189b983badf/main.nf#L22

ewels commented 5 years ago

Why not use the [type] notation in the help docs, as already done for --clip_r1 and others? More explicit, more consistent and easier for beginners 😉

--single_end [bool]            Specifies that the input is single-end reads
--seq_center [str]             Sequencing center information to be added to read group of BAM files

Or maybe [flag] for the bools to make it clearer. Or nothing for those, but [str] for anything with an argument... But you get my point anyway.

ewels commented 5 years ago

x-ref https://github.com/nf-core/tools/issues/426

drpatelh commented 5 years ago

Ok. Having thought about it a little, I'm coming round to using snake_case for all parameters. It is probably best for standardisation. As you mentioned, we need to consistently enforce the use of [type] notation in main.nf and in usage.md because most people won't look at the main script.

Let's agree on this so I can update the template to reflect this and we will need to add some docs to the website too. I have a few pipelines that are ripe for release now and so I can make the update sooner rather than later.

@nf-core/core are we all in agreement? Speak now or forever hold your peace :wink:

ewels commented 5 years ago

Nice! This should go into the guidelines that @MaxUlysse is starting to draft. He maybe has strong views on the issue?

Note that we should be auto-generating the pipeline help text from the parameters settings file soon (https://github.com/nf-core/hlatyping/pull/61), so that will make that part much easier. And we can / should also lint the documentation from that (https://github.com/nf-core/tools/issues/111), so can also enforce standardisation there.

apeltzer commented 5 years ago

Snake case is fine by me - as long as we all adhere to it and make things as the automated linting by @ewels possible ;-)

drpatelh commented 5 years ago

Ive made some headway with this for the atacseq and chipseq pipelines: https://github.com/drpatelh/nf-core-atacseq/blob/7e2c171753bebe093e129d2163041cc54b64435e/main.nf#L21-L87

Any comments? Also changed --design to --input for standardisation.

ewels commented 5 years ago

Nice!