kusterlab / prosit

Prosit offers high quality MS2 predicted spectra for any organism and protease as well as iRT prediction. When using Prosit is helpful for your research, please cite "Gessulat, Schmidt et al. 2019" DOI 10.1038/s41592-019-0426-7
https://www.proteomicsdb.org/prosit/
Apache License 2.0
84 stars 47 forks source link

Missing config.yml from Prosit-TMT model files #84

Closed cctsou closed 2 years ago

cctsou commented 2 years ago

Hi,

I am trying to install Prosit-TMT on my local Linux machine. I downloaded the model files for TMT model but there is no config.yml file. I tried copy the config.yml from your 2019 model file package but encountered errors while running the Prosit service. Looks like the config.yml is not compatible. Could you provide the correct config.yml files for iRT and fragmentation models?

cctsou commented 2 years ago

Hi,

I am trying to install Prosit-TMT on my local Linux machine. I downloaded the model files for TMT model but there is no config.yml file. I tried copy the config.yml from your 2019 model file package but encountered errors while running the Prosit service. Looks like the config.yml is not compatible. Could you provide the correct config.yml files for iRT and fragmentation models?

The config.yml files were provided and are available for download at https://figshare.com/projects/Prosit_TMT_-_Model_-_Fragmentation/128438 https://figshare.com/projects/Prosit_TMT_iRT_-_Model/128432

However, even with model files, it looks like the Prosit code here on Github is not compatible with the TMT model. The TMT model requires fragmentation as an additional input but the public code here only handles/encodes sequence, charge state, and collision energy.

Would you be able to provide the Prosit code that is compatible with the TMT model so I could run it on my local machine?

Thank you very much!! Chih-Chiang

courcelm commented 2 years ago

I'm also interested in the updated source code to run this model.

@cctsou - did you try to implement the missing code?

cctsou commented 2 years ago

I'm also interested in the updated source code to run this model.

@cctsou - did you try to implement the missing code?

No I didn't. I am still hoping that the Prosit team will update the codes so we could use it on our local machine.

courcelm commented 2 years ago

@cctsou Thanks for replying. Let's hope but I have my doubts about this.

I inspected the fragmentation model this morning (https://figshare.com/projects/Prosit_TMT_-_Model_-_Fragmentation/128438).

It seems that the provided model.yml and the model weight hdf5 files don't match.

The weight file has these fields: ['activation', 'add_meta', 'collision_energy_in', 'decoder', 'dense_19', 'dropout_28', 'dropout_29', 'dropout_30', 'embedding', 'encoder1', 'encoder2', 'encoder_att', 'fragmentation_type_in', 'meta_dense', 'meta_dense_do', 'meta_in', 'multiply_10', 'out', 'peptides_in', 'permute_19', 'permute_20', 'precursor_charge_in', 'repeat', 'timedense']

model.yaml refers to dropout_18 and other fields which are not in the weight file.

I guess we need fixed model files too...

WassimG commented 2 years ago

updated the model file now it has the same fields as the weight file.

cctsou commented 1 year ago

I was able to build the server with TMT model files, but then I encountered the following error by running a small peptide list as a csv file:

modified_sequence,collision_energy,precursor_charge,fragmentation ALNNLPALQAM(ox)TLALNR,35,2,HCD EAAALLDDCIFNM(ox)VLLK,35,3,CID DPLSSYNIIAWDWNGPK,35,2,HCD KTDCCILSALLFQGLLR,35,3,CID

Error message `[2023-02-22 18:35:16,739] ERROR in app: Exception on /predict/msp [POST] Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 39, in reraise raise value File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1936, in dispatch_request return self.view_functions[rule.endpoint](req.view_args) File "/root/prosit/server.py", line 51, in return_msp result = predict(flask.request.files["peptides"]) File "/root/prosit/server.py", line 29, in predict data = prediction.predict(data, d_spectra) File "/root/prosit/prediction.py", line 13, in predict x = io_local.get_array(data, d_model["config"]["x"]) File "/root/prosit/io_local.py", line 5, in get_array utils.check_mandatory_keys(tensor, keys) File "/root/prosit/utils.py", line 7, in check_mandatory_keys raise KeyError("key {} is missing".format(key)) KeyError: 'key fragmentation is missing' `**

I believe that the error was because "fragmentation" was not parsed in the input data frame, I tried adding "fragmentation" into the csv parsing function below but I do not know how fragmentation is encoded. Could you please help? Could you provide the Prosit codes that are fully compatible with the TMT models you provided?

def csv(df): df.reset_index(drop=True, inplace=True) assert "modified_sequence" in df.columns assert "collision_energy" in df.columns assert "precursor_charge" in df.columns data = { "collision_energy_aligned_normed": get_numbers(df.collision_energy) / 100.0, "sequence_integer": get_sequence_integer(df.modified_sequence), "fragmentation": df.fragmentation, "precursor_charge_onehot": get_precursor_charge_onehot(df.precursor_charge), "masses_pred": get_mz_applied(df), }