Closed bricoletc closed 2 years ago
Test coverage report:
$ coverage run -m pytest
$ coverage report --omit="venv*,tests*"
Name Stmts Miss Cover
-------------------------------------------------------------
make_prg/__init__.py 6 2 67%
make_prg/from_msa/__init__.py 4 0 100%
make_prg/from_msa/cluster_sequences.py 170 1 99%
make_prg/from_msa/interval_partition.py 124 2 98%
make_prg/from_msa/prg_builder.py 132 21 84%
make_prg/io_utils.py 94 72 23%
make_prg/prg_encoder.py 50 0 100%
make_prg/seq_utils.py 55 4 93%
make_prg/subcommands/__init__.py 1 0 100%
make_prg/subcommands/output_type.py 24 0 100%
-------------------------------------------------------------
TOTAL 660 102 85%
The missing lines in prg_builder.py
are all logging or to do with computing the max nesting level reached, so coverage is decent
Great review @mbhall88 thanks, i add some of the proposed changes to this pr and replied to the other comments
Thanks for your review @leoisl, i've added your suggestions, please merge this to dev
if you're happy with my replies to the still open comments. I think we should then update make_prg
to version 0.1.2, before your update
PRs making up version 0.2.0
I think we should then update
make_prg
to version 0.1.2, before yourupdate
PRs making up version 0.2.0
In keeping with semver, we need to bump the minor version (0.2.0). See discussion on https://github.com/iqbal-lab-org/make_prg/pull/26
So we either bump to 0.2.0 now, and then 0.3.0 when Leandro merges his stuff. Or we wait and do a massive merge and just bump to 0.2.0
Good point @mbhall88 , in that case I recommend we do 0.2.0 here. This will help stick to this version if the update
command changes break/change/improve prg construction in any way, which I think remains to be tested.
I agree with merging this PR and bumping to 0.2.0
Here are some changes to produce many fewer graphs with sequence ambiguity- when different paths spell the same sequence, causing problems for downstream read mapping and genotyping.
See #27 for the reduction in ambiguous genotype calls in gramtools across 26 genes in 14 samples, cut out 99% of them on this dataset!
My genotyping performance did not improve overall on this dataset (though it fixed the one in #27), so it may be good to try the changes here on another validation dataset. In fact for upcoming changes by @leoisl we need to do that IMO.
If happy with these changes we should probably version bump.
Fixed
AT-TTTTGA ATTTT-TGA