kusterlab / prosit

Prosit offers high quality MS2 predicted spectra for any organism and protease as well as iRT prediction. When using Prosit is helpful for your research, please cite "Gessulat, Schmidt et al. 2019" DOI 10.1038/s41592-019-0426-7
https://www.proteomicsdb.org/prosit/
Apache License 2.0
85 stars 45 forks source link

error message: Unknown Element in string: {sequence}. Found Elements: {x}") NameError: name 'x' is not defined make #48

Open gsaxena888 opened 4 years ago

gsaxena888 commented 4 years ago

I'm submitting a small csv file (~1 MB) but I'm getting this error message. I can't make sense of what seems to be the problem. I searched my input file, and I don't have any sequence with "x" in it.

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. Traceback (most recent call last): File "oktoberfest/grpc_predict_peptidelist.py", line 36, in <module> disable_progress_bar=True) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/predictPROSIT.py", line 153, in predict_to_hdf5 models=[irt_model, intensity_model]) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/predictPROSIT.py", line 123, in predict self.input.prepare_input(disable_progress_bar) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/inputPROSIT.py", line 15, in prepare_input self.sequences.prepare_sequences(flag_disable_progress_bar) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/inputPROSIT.py", line 143, in prepare_sequences self.character_to_array(flag_disable_progress_bar) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/inputPROSIT.py", line 118, in character_to_array total=len(self.character)): File "/root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/tqdm/std.py", line 1107, in __iter__ for obj in iterable: File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/__utils__.py", line 88, in split_modstring raise ValueError(f"Unknown Element in string: {sequence}. Found Elements: {x}") NameError: name 'x' is not defined make: *** [grpc_predict] Error 8

LLautenbacher commented 4 years ago

Hi,

the x is a bug that will be resolved as soon as the newest version of Prosit goes online. Until then you can give me your Task id and i can check what Amino acid encodings we do not support. Some candidates that appear frequently are 'U' 'O' or any Modification encoding except M(ox).

gsaxena888 commented 4 years ago

Per your request, here are the ids:

https://www.proteomicsdb.org/prosit/task/59E4B1B4F2EB4221D48B4A16D8311B5D https://www.proteomicsdb.org/prosit/task/37D8EE737A425CD40CCBFF2633682DA5 https://www.proteomicsdb.org/prosit/task/BE75F1630A7D3C02563BC86285FEA828

gsaxena888 commented 4 years ago

Any thoughts on the above? Also, would you have a general ETA for new version?

LLautenbacher commented 4 years ago

Some sequences in your files seem to be replaced with numbers. The first file includes a 3 at line 1105579, the second file a 3 at line 62064 and a 5 at line 880088. The third i didn't check I assume it has the same issue. Otherwise the files look fine. If you remove the faulty lines it should work.

Regarding the ETA is hard for me to say. @tkschmidt can you give a general ETA when the new Prosit version will become public?

gsaxena888 commented 4 years ago

Thanks @LLautenbacher . You were correct. I have since fixed the issue and am running 4 searches of around ~45MB each, but they've been running now for ~9 hours and counting. Not sure if that's normal?

@tkschmidt Any thoughts on when the new version might be available either as downloadable source code or as a publicly accessible web-based system (such as the current prosit website)?

LLautenbacher commented 4 years ago

Are you sure these are the correct Task-IDs? They seem to have been uploaded ~4 hours before your comment. Regardless of this the tasks you sent are not started yet because the server is currently occupied with other jobs.

gsaxena888 commented 4 years ago

@LLautenbacher You may be correct again. They may have only been uploaded ~4 hours before my comment. That said, as of 6 am EST today, they're still processing, so I'm guessing it's because the server is currently occupied with other jobs. I'll check again tonight. Many thanks in advance for your assistance.

gsaxena888 commented 4 years ago

The prediction seems to have completed successfully! Thanks! @LLautenbacher If you have any thoughts regarding ETA, please feel free to let us know :) (even if it's something like "not before x years" etc :)

nicorellius commented 3 years ago

Reporting this issue on my end. See attached. 2021-04-06_error_log_168BC5DAC8DD9A9FF6077D8A31C58FBA.txt