patrickbryant1 / Umol

Protein-ligand structure prediction
183 stars 17 forks source link

IndexError: list index out of range, while created a new protein #7

Closed velocirraptor23 closed 9 months ago

velocirraptor23 commented 9 months ago

Hu all,

I got an error when submmited a new protein.

Could help with this please? I have updated the msa, sequence and positions: LIGAND = "N#CCC(=O)N(CC1)CC@@HN(C)c2ncnc(c23)[nH]cc3" # @param {type:"string"} SEQUENCE = "RKSPLTLEDFKFLAVLGRGHFGKVLLSEFRPSGELFAIKALKKGDIVARDEVESLMCEKRILAAVTSAGHPFLVNLFGCFQTPEHVCFVMEYSAGGDLMLHIHSDVFSEPRAIFYSACVVLGLQFLHEHKIVYRDLKLDNLLLDTEGYVKIADFGLCKEGMGYGDRTSTFCGTPEFLAPEVLTDTSYTRAVDWWGLGVLLYEMLVGESPFPGDDEEEVFDSIVNDEVRYPRFLSAEAIGIMRRLLRRNPERRLGSSERDAEDVKKQPFFRTLGWEALLARRLPPPFVPTLSGRTDVSNFDEEFTGEAPTLSPPRDARPLTAAEQAAFLDFDFVAGGC" #@param {type:"string"} TARGET_POSITIONS = "17,28,19,20,23,24,25,91,92,93,94" #@param {type:"string"}

it creates the proteinin first step but then when it creates the paramerts and the complex, it fails.

error: File /cluster/ddu/cmmartinez001/Projects/Umol/content/Umol/src/make_msa_seq_feats_colab.py:98, in process(input_fasta_path, input_msas) 96 parsed_msa, parsed_deletionmatrix, = parsers.parse_stockholm(msa) 97 elif custom_msa[-3:] == 'a3m': ---> 98 parsed_msa, parsed_deletion_matrix = parsers.parse_a3m(msa) 99 else: raise TypeError('Unknown format for input MSA, please make sure ' 100 'the MSA files you provide terminates with (and ' 101 'are formatted as) .sto or .a3m') 102 parsed_msas.append(parsed_msa)

File /cluster/ddu/cmmartinez001/Projects/Umol/content/Umol/src/net/data/parsers.py:142, in parse_a3m(a3m_string) 127 def parse_a3m(a3mstring: str) -> Tuple[Sequence[str], DeletionMatrix]: 128 """Parses sequences and deletion matrix from a3m format alignment. 129 130 Args: (...) 140 the aligned sequence i at residue position j. 141 """ --> 142 sequences, = parse_fasta(a3m_string) 143 deletion_matrix = [] 144 for msa_sequence in sequences:

File /cluster/ddu/cmmartinez001/Projects/Umol/content/Umol/src/net/data/parsers.py:62, in parse_fasta(fasta_string) 60 elif not line: 61 continue # Skip blank lines. ---> 62 sequences[index] += line 64 return sequences, descriptions

IndexError: list index out of range

Best wishes,

Cesar

patrickbryant1 commented 9 months ago

Hi,

There seems to be something wrong with the MSA you provided. Please look at it and see if you have empty rows or similar. The MSA has to be a3m.

velocirraptor23 commented 9 months ago

Hi,

Thanks for your reply, it seems it works now, the problem was at the beginning of the file, I did not have the correct ID, i just reeplace that. Then in the next step I got a nother error. Googling it seems it is an issue with JAX. Not sure, I tought it was about the RAM memory but probably not.

XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 4263510016 bytes.

Thanks a lot,

Cesar

patrickbryant1 commented 9 months ago

Great 👍 Yes, try cropping the protein sequence if you don't have more resources.

velocirraptor23 commented 9 months ago

Hi,

While running this, I was wondering if Umol send information I submmit locally to a webserver like ESMFold. Or if everything is in the local installation. Byt he way I just fix the Jax installation and now it works perfect.

Best wishes,

Cesar

patrickbryant1 commented 9 months ago

Hi, No nothing is sent locally - you keep all info. ESMfold is only used to visualize the target site before predicting in the Colab, but not used in any way.

Glad to hear that.

velocirraptor23 commented 9 months ago

Thanks a lot for your prompt respose. I got some questions about how the code works, but I probably will send en email. For now I m going to close this as it is solved.