Closed rneher closed 3 years ago
Looks good, so basically you suggest using a feature of augur tree
to do soft masking for sites when tree building thereby getting rid of one rule mask
.
I'll test it
Throws error
Traceback (most recent call last):
File "scripts/explicit_translation.py", line 57, in <module>
features = read_gff(args.annotation)
File "scripts/explicit_translation.py", line 17, in read_gff
for rec in GFF.parse(in_handle):
File "/usr/local/Caskroom/mambaforge/base/envs/nextstrain/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 780, in parse
for rec in parser.parse_in_parts(gff_files, base_dict, limit_info,
File "/usr/local/Caskroom/mambaforge/base/envs/nextstrain/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 329, in parse_in_parts
cur_dict = self._results_to_features(cur_dict, results)
File "/usr/local/Caskroom/mambaforge/base/envs/nextstrain/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 367, in _results_to_features
(_, base) = self._add_toplevel_feature(base, feature)
File "/usr/local/Caskroom/mambaforge/base/envs/nextstrain/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 570, in _add_toplevel_feature
rec, base = self._get_rec(base, feature_dict)
File "/usr/local/Caskroom/mambaforge/base/envs/nextstrain/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 535, in _get_rec
match_id = self._get_matching_record_id(base, info_dict['rec_id'])
File "/usr/local/Caskroom/mambaforge/base/envs/nextstrain/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 409, in _get_matching_record_id
elif find_id.find("|") > 0:
AttributeError: 'NoneType' object has no attribute 'find'
Bug only locally, on scicore it seems to have worked. Will try builds with this PR to check it really works.
I need to remove 95/142 from distance map to prevent regression of #7, then I can merge.
Test build will be available at https://nextstrain.org/groups/neherlab/ncov/europe/2021-10-34
If S:95 and S:142 are hard masked, why do we still see it reconstructed on the tree? https://nextstrain.org/groups/neherlab/ncov/europe/2021-10-34?branchLabel=spike_mutations&c=S1_mutations&f_S1_mutations=10,11,12,9&label=spike_mutations:I95T
Talked to Richard, it's expected behaviour that S:95/S:142 show up as annotations, the alternative masking scheme here is not just a refactor, but does add hard masked sites back into ancestral reconstruction, thus they show.
@corneliusroemer I haven't tested this yet. but this might be a less round-about way of maskign