Make fasta reader alphabet-aware (filter version)

Check if characters are in alphabet when reading fasta and filter them out if not.

This has a problem - the sequences come out of different length and we can no longer deduce alignment length. I think this is wrong, because we are supposed to see aligned sequences.

Currently fails:

+/workdir/.build/docker/release/treetime ancestral --method-anc=parsimony --tree=data/lassa/L/50/tree.nwk --outdir=tmp/smoke-tests/ancestral/parsimony/lassa/L/50 data/lassa/L/50/aln.fasta.xz Error: 0: When calculating length of sequences 1: Sequences are expected to all have the same length, but found the following lengths: Length 845: "MK107855" Length 871: "MK107845" Length 873: "MH887995"

Now only ebola fails (Makona-UK3 contains nuc U, all others - don't)

+/workdir/.build/docker/release/treetime ancestral --method-anc=marginal --dense=true --model=jc69 --tree=data/ebola/tree.nwk --outdir=tmp/smoke-tests/ancestral/marginal/ebola data/ebola/aln.fasta.xz
Error: 
   0: When calculating length of sequences
   1: Sequences are expected to all have the same length, but found the following lengths:

      Length 13915:
          "Makona-UK3"

      Length 19006:
          "EM_COY_2015_015982"
          "G3676"
          "EM_COY_2015_015980"
          "G3670"
          "CON-10590"
          "NM042"
          "EM_079497"
          <remaining sequence names here>

Can do char replacement instead (gap? unknown? still depends on alphabet)

A fallible alternative is here:

Make fasta reader alphabet-aware (error version) #283

neherlab / treetime

Make fasta reader alphabet-aware (filter version) #282