misshie / ngsdat2

NGS Data Analysis Textbook Version 2 (Disease Genome Analysis)
MIT License
42 stars 22 forks source link

Update 100_run-BaseRecalibrator.sh #2

Closed no85j closed 4 years ago

no85j commented 4 years ago

about GATK4 version

tom-tan commented 4 years ago

IMO It may breaks reproducibility but I'm not sure it is acceptable for this repository.

It would be nice there is a guideline for the policy of software versions in the repositories for 次世代シークエンサー DRY解析教本 改訂第2版: using the latest version for (possibly) better result or fix the software versions for reproducibility.

@bonohu Any thoughts?

tom-tan commented 4 years ago

In this case, we can provide the default version of GATK while allowing users to overriding it by using ${variable:-default_value} syntax.

Here is an example:

$ cat example.sh
#!/bin/bash

msg=${msg:-hello}
echo $msg
$ ./example.sh
hello
$ env msg=yahoo ./example.sh
yahoo

But it introduces extra complexities :-(

misshie commented 4 years ago

I really appreciate contribution and discussion of @no85j, @tom-tan, and @bonohu .

In codes of @no85j, gatk=gatk-4*/gatk indicates the gatk executable in the first mathing directory and successfully work for newer GATK releases if a reader downloads only single version of GATK.

As @tom-tan commented, fixing using executable explicitly in scripts is good policy for analysis reproducibiliy. A suggestive code by @tom-tan using Bash’s default value assignment is an elegant way. I prefer, however, readers’ overwriting sample scripts for their individual environments (i.e., downloaded versions of tools) by themselves.

For command line typing, I wrote in the textbook that readers should substitute version numbers. For sample scripts, I did not wrote that.

My idea is to revert last merge and to notice necessity of overwriting sample scripts in the readme of this Github repository.