ntranoslab / esm-variants

MIT License
66 stars 12 forks source link

Amino Acid not in Alphabet #6

Open SophieS9 opened 1 year ago

SophieS9 commented 1 year ago

Hi Team - thanks for the great piece of software!

I've hit an issue where prediction is failing on the fasta file from the gene SELENON (attached, changed to .txt to allow upload - protein.txt). From the error trace below, I think it's because it doesn't like the "U" in the sequence, but this is the amino acid code for selenocysteine.

Traceback (most recent call last):
  File "/home/sophieshaw/esm-variants/esm_score_missense_mutations.py", line 77, in <module>
    main(args)
  File "/home/sophieshaw/esm-variants/esm_score_missense_mutations.py", line 51, in main
    input_df_ids, LLRs = get_wt_LLR(input_df, model=model, alphabet=alphabet, batch_converter=batch_converter, d
  File "/home/sophieshaw/esm-variants/esm_variants_utils.py", line 38, in get_wt_LLR
    wt_norm=np.diag(WTlogits.loc[[i.split(' ')[0] for i in WTlogits.columns]])
  File "/home/sophieshaw/miniconda3/envs/pm1_automation/lib/python3.7/site-packages/pandas/core/indexing.py", line 931, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/home/sophieshaw/miniconda3/envs/pm1_automation/lib/python3.7/site-packages/pandas/core/indexing.py", line 1153, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/home/sophieshaw/miniconda3/envs/pm1_automation/lib/python3.7/site-packages/pandas/core/indexing.py", line 1093, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis)
  File "/home/sophieshaw/miniconda3/envs/pm1_automation/lib/python3.7/site-packages/pandas/core/indexing.py", line 1314, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis)
  File "/home/sophieshaw/miniconda3/envs/pm1_automation/lib/python3.7/site-packages/pandas/core/indexing.py", line 1377, in _validate_read_indexer
    raise KeyError(f"{not_found} not in index")
KeyError: "['U'] not in index"

Any suggestions?

Thanks!

Sophie S

nadavbra commented 1 year ago

Our code currently doesn't handle non-standard amino acids. I'm currently in the process of refactoring some of our code, and this will be trivial to fix after the refactor, but it may take a few weeks until I get to it. Sorry about that.

SophieS9 commented 1 year ago

No problem :) Thanks for the update @nadavbra!