ztzou / conv_cal

Convergence event counting and probability calculation according to Zou and Zhang, 2015, Mol. Biol. Evol.
8 stars 1 forks source link

KeyError: '-' in 2nd step #1

Open virologist opened 3 years ago

virologist commented 3 years ago

Hi, Dr. Zou

So glad to see your updated version. I have some questions here. 1.For other studies (like virus evolution), do I need to modify the amino acid substitution model used by PAML? Is this model (jones.dat) universal? 2.What kind of phylogenetic tree does the conv_cal prefer? ML or beast MCC tree? 3.Is that proper to keep all sites? cleandata = 0 * remove sites with ambiguity data (1:yes, 0:no)?

I completed the first step, but when I attempt to run the second step, an error message appeared.

python calc_expconv.py ../01_ancestral . ./branch_group.list ../00_input/jones.dat site
[Fri Jun 18 17:27:36 2021] Start ...
1 genes to process
8 groups to check
Traceback (most recent call last):
  File "calc_expconv.py", line 227, in <module>
    main()
  File "calc_expconv.py", line 35, in main
    freqs = get_freqs(tree, freq_mode)
  File "calc_expconv.py", line 135, in get_freqs
    freq[i, d[seq[i]]] += 1.
KeyError: '-'

image

Can you help me to figure it out? Thank you very much.

Sincerely, Yang

ztzou commented 3 years ago

Hi Yang,

  1. You may want to use virus-derived matrices such and FLU or HIV.
  2. Ideally, a TRUE tree should be used, so use whatever tree that you think is most likely to be the true species tree.
  3. You should avoid non-amino-acid character in your input data, gaps or ambiguous characters, for example, should be removed prior to the analysis. This is also likely the reason of your error, since '-' is not an amino acid.

Best,

Zhengting

virologist commented 3 years ago

Hi, @ztzou

Thanks for your reply. I finished the 1st step with my data. If I want to find convergent and parallel evolution across the entire tree without focusing on any particular branch, what should I do in Step 2?

Best, Yang