Closed casparbein closed 1 year ago
Dear @casparbein,
Best, Sergei
Hi Sergei,
thanks for your quick reply. As to your second question, yes, I mean the command line mode where you have to interact with the program as opposed to a statement where you specify all parameters in advance. In absrel, for example, I would run something like this:
hyphy absrel --alignment sequence.fa --tree tree.nh ENV='TOLERATE_NUMERICAL_ERRORS=1;' --output out.json
where the output is a json file. Here, under the absrel section, four output files are listed. The json alone is fine; I was just wondering if I missed a command line option (something like --format) where I could specify more things.
Cheers, Bernhard
Dear @casparbein,
Ah, I understand now. The tutorial itself is a bit out-of-date (it's from ~2017), so some of the options have disappeared. With the newer HyPhy analysis, i.e. the ones that take --key value
type arguments you have three types of arguments, required, optional, and conditional.
You can see most analysis arguments by typing
$hyphy absrel --help
code
Which genetic code should be used
default value: Universal
alignment [required]
An in-frame codon alignment in one of the formats supported by HyPhy
tree [conditionally required]
A phylogenetic tree (optionally annotated with {})
applies to: Please select a tree file for the data:
branches
Branches to test
default value: All
multiple-hits
Include support for multiple nucleotide substitutions
default value: None
srv
Include synonymous rate variation
default value: No
syn-rates
The number alpha rate classes to include in the model [1-10, default 3]
default value: absrel.synonymous_rate_classes [computed at run time]
output
Write the resulting JSON to this file (default is to save to the same path as the alignment file + 'ABSREL.json')
default value: absrel.codon_data_info[terms.json.json] [computed at run time]
The conditional arguments, like --tree
apply only in certain cases, for example if the alignment file does not contain a tree in it.
The optional arguments, like --branches
will have default values (All
) that are used unless overriden.
Because hyphy absrel --help
works by scanning the script file for absrel
for specific commands, it may not always detect every available option, especially if they are defined in script files that are loaded at run time by absrel
.
One more reason to develop better docs. Unfortunately, as you well know, I am sure, documentation is the last thing that academic s/w developers usually focus on.
I'll create a list of key environment variables in this issue and ask @stevenweaver to also post a version of it on our main website.
Best, Sergei
PS. We have been trying to standardize common analysis outputs to be a single JSON file.
Dear Sergei,
thank you for your detailed reply, I appreciate your efforts to extend the documentation. Also, I want to reiterate that the programs in the HyPhy suite we use work really well for our purposes. The issue I raised was more to convince ourselves that we are not missing important parameters that we simply could not find in the manual or the website.
Thanks again for taking this seriously. Cheers, Bernhard
Here are some environment variables that may be of general use. Note that some of the analyses provide their own values for some of these variables, and those will take precedence over whatever is specified on the command line.
Variable | Description | Value range |
---|---|---|
TOLERATE_NUMERICAL_ERRORS |
What should be done when internal diagnostics indicate that likelihood function calculations may be subject to numerical error/instability. In most cases, these issues are encountered if the optimizer arrives at a set of parameter values that are in some sense extreme (close to the lowest/highest values). |
|
TOLERATE_CONSTRAINT_VIOLATION |
What should be done when := , :< or :> constraints on model parameters cannot be satisfied (no feasible solution can be automatically found). In most cases, these issues are encountered if the optimizer arrives at a set of parameter values that are in some sense extreme, or if a set of mutually contradictory constraints is specified (e.g. x2 :< x1; x2 :> x3; x3 := x1 + 1; ) |
|
NORMALIZE_SEQUENCE_NAMES |
Ask HyPhy to automatically convert sequence names to valid idetifiers, by replacing ` and other "inadmissible" characters with _. This is done because HyPhy needs to be able to create parameter names like tree.node.parameterwhere nodeis a sequence name for leaf nodes. If node` has spaces, arithmetic operation symbols, etc, this will lead to run-time errors. |
|
COUNT_GAPS_IN_FREQUENCIES |
If set to TRUE, - will contribute 1/N to character counts for base frequency estimation; for example ACGT- will count 1.25 for each base. |
|
SKIP_OMISSIONS |
If set to TRUE, then any site in a multiple sequence alignment with gaps or N-fold degeneracies will be automatically filtered out |
|
USE_MEMORY_SAVING_DATA_STRUCTURES |
Any alignments with more that this many sites will not generate some of the additional information (maps of duplicate site patterns, etc). This may trigger error messages in many standard analyses which expect those objects to be present. Set to larger values to override this behavior | An integer >1, 100000 by default |
DATA_FILE_PRINT_FORMAT |
Whenever a dataset / datafilter is written out, use this format | An integer from the following list
|
Dear Sergei,
thank you very much for this list! I am sure that it will be helpful for us and others.
Cheers, Bernhard
Dear @casparbein,
I'll keep adding to it; there's quite a few more. Although I think it might be better done including specific examples.
Best, Sergei
Hi,
Thanks for developing and maintaining the HyPhy suite!
I have a general question/comment: As seen in this issue, sometimes there seem to be command line options in HyPhy that I could not find in the documentation anywhere. Since we are running several HyPhy tools over a large number of alignments, interactive command line operations are not feasible. Is there an exhaustive manual, where options like ENV or others are explained? Overall, the command line functionality seems to be not as powerful as the interactive mode, since for example in absrel, the output will be only a json file (as opposed to the additional CSV and Nexus file of the interactive mode).
Thanks a lot!