neherlab / ncov-simple

2 stars 1 forks source link

feat: alternative masking scheme #24

Closed rneher closed 2 years ago

rneher commented 2 years ago

@corneliusroemer I haven't tested this yet. but this might be a less round-about way of maskign

corneliusroemer commented 2 years ago

Looks good, so basically you suggest using a feature of augur tree to do soft masking for sites when tree building thereby getting rid of one rule mask.

I'll test it

corneliusroemer commented 2 years ago

Throws error

Traceback (most recent call last):
  File "scripts/explicit_translation.py", line 57, in <module>
    features = read_gff(args.annotation)
  File "scripts/explicit_translation.py", line 17, in read_gff
    for rec in GFF.parse(in_handle):
  File "/usr/local/Caskroom/mambaforge/base/envs/nextstrain/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 780, in parse
    for rec in parser.parse_in_parts(gff_files, base_dict, limit_info,
  File "/usr/local/Caskroom/mambaforge/base/envs/nextstrain/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 329, in parse_in_parts
    cur_dict = self._results_to_features(cur_dict, results)
  File "/usr/local/Caskroom/mambaforge/base/envs/nextstrain/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 367, in _results_to_features
    (_, base) = self._add_toplevel_feature(base, feature)
  File "/usr/local/Caskroom/mambaforge/base/envs/nextstrain/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 570, in _add_toplevel_feature
    rec, base = self._get_rec(base, feature_dict)
  File "/usr/local/Caskroom/mambaforge/base/envs/nextstrain/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 535, in _get_rec
    match_id = self._get_matching_record_id(base, info_dict['rec_id'])
  File "/usr/local/Caskroom/mambaforge/base/envs/nextstrain/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 409, in _get_matching_record_id
    elif find_id.find("|") > 0:
AttributeError: 'NoneType' object has no attribute 'find'
corneliusroemer commented 2 years ago

Bug only locally, on scicore it seems to have worked. Will try builds with this PR to check it really works.

corneliusroemer commented 2 years ago

I need to remove 95/142 from distance map to prevent regression of #7, then I can merge.

corneliusroemer commented 2 years ago

Test build will be available at https://nextstrain.org/groups/neherlab/ncov/europe/2021-10-34

corneliusroemer commented 2 years ago

If S:95 and S:142 are hard masked, why do we still see it reconstructed on the tree? https://nextstrain.org/groups/neherlab/ncov/europe/2021-10-34?branchLabel=spike_mutations&c=S1_mutations&f_S1_mutations=10,11,12,9&label=spike_mutations:I95T image

corneliusroemer commented 2 years ago

Talked to Richard, it's expected behaviour that S:95/S:142 show up as annotations, the alternative masking scheme here is not just a refactor, but does add hard masked sites back into ancestral reconstruction, thus they show.