Added -O for original coordinates agp, which required adding -b and -G to get_seq.py. Refactored part of get_seq.py to simplify string building and use consistent whitespace. Removed trailing tab in output agp. Also updated readme.
The differences in get_seq.py probably look larger than they are because I used with statements when opening some of the files which changed the indentation levels. I tested my refactoring independently of the changes to add the original coordinates functionality.
Summary of functional changes:
Supplying the -O flag to run_pipeline.py will cause get_seq.py to be called with its new -b and -G options. -b provides input (the input_breaks file) and -G tells it where to write the extra AGP file with the original coordinates (as opposed to the coordinates from assembly.cleaned.fasta). Since the extra output filename is controlled by run_pipeline.py and not directly exposed to the user, the output names match the pattern: <name>.original-coordinates.agp, where <name> is scaffolds_FINAL at the end of the pipeline and scaffolds_ITERATION_# in intermediate steps (if -p was supplied to run_pipeline.py). The regular output (e.g., scaffolds_FINAL.agp) is still output and is identical regardless of whether the user supplies the new -O flag.
Run times are not significantly affected by this change despite the extra work, presumably because the extra computation is trivial, the extra files are small, and the refactoring prevents the repeated copying of strings.
Added
-O
for original coordinates agp, which required adding-b
and-G
toget_seq.py
. Refactored part ofget_seq.py
to simplify string building and use consistent whitespace. Removed trailing tab in output agp. Also updated readme.The differences in
get_seq.py
probably look larger than they are because I usedwith
statements when opening some of the files which changed the indentation levels. I tested my refactoring independently of the changes to add the original coordinates functionality.Summary of functional changes: Supplying the
-O
flag torun_pipeline.py
will causeget_seq.py
to be called with its new-b
and-G
options.-b
provides input (theinput_breaks
file) and-G
tells it where to write the extra AGP file with the original coordinates (as opposed to the coordinates fromassembly.cleaned.fasta
). Since the extra output filename is controlled byrun_pipeline.py
and not directly exposed to the user, the output names match the pattern:<name>.original-coordinates.agp
, where<name>
isscaffolds_FINAL
at the end of the pipeline andscaffolds_ITERATION_#
in intermediate steps (if-p
was supplied torun_pipeline.py
). The regular output (e.g.,scaffolds_FINAL.agp
) is still output and is identical regardless of whether the user supplies the new-O
flag.Run times are not significantly affected by this change despite the extra work, presumably because the extra computation is trivial, the extra files are small, and the refactoring prevents the repeated copying of strings.