kusterlab / prosit

Prosit offers high quality MS2 predicted spectra for any organism and protease as well as iRT prediction. When using Prosit is helpful for your research, please cite "Gessulat, Schmidt et al. 2019" DOI 10.1038/s41592-019-0426-7
https://www.proteomicsdb.org/prosit/
Apache License 2.0
85 stars 46 forks source link

Rescoring only allowing one RAW-file in the MAxQuant search #16

Closed MarcIsak closed 5 years ago

MarcIsak commented 5 years ago

Hi,

as I understand, when using the rescoring functionality of PROSIT, the submitted msms.txt file must come from a MaxQuant search where only a single RAW-file's been run. I think it is a pity though, because it is not very often it is useful to run a MaxQuant search with only one RAW-file. I hope I don´t sound too negative here, I just want to help out.

The rescoring function would be more useful if it could: (1) Improve a standard database search where several RAW-files been searched in a quantitative workflow (like LFQ). (2) Create a deeper and more accurate DDA-based spectral library. In this case, several RAW-files would be set up as fractions of a sample in a MaxQuant search, resulting in a single msms.txt file that could go into the PROSIT rescoring. This is what I would like to use the rescoring for, if possible.

I understand that problems can arise if one tries to rescore a MaxQuant search where RAW-files were acquired with different instrument methods or instruments. That could screw up the CE-calibration. But if all files were acquired on the same instrument, with the same instrument method I don't see why it would be a problem to include several RAW-files in the msms.txt. In many (if not most cases) the RAW-files used in a MaxQuant search are acquired by the same instrument method and instrument. So perhaps, one could just warn users of the rescoring function to not upload a msms.txt file where the RAW-files are not acquired by the same method and/or instrument.

Once again, I hope I didn't sound too pessimistic here. I think the idea of rescoring a MaxQuant search or any other search engine with PROSIT is superb.

gessulat commented 5 years ago

Hi @MarcIsak,

no worries, we are very aware of the points that you mentioned. The reason we currently limit the rescoring to one RAW file is that we need to ensure that our resources are used fairly by everyone.

Please understand that we are providing the Prosit online service free of charge. MaxQuant searches including multiple RAW files take a substantial amount of space as well as computing time on our servers. We need to temporarily store results so that users can download it. Managing this for tasks that include say 100 x 1GB RAW files is challenging and costly. Also, such a large task would block our queue for a long time, preventing other users from using our service. That is what we want to avoid.

What you can do if you have multiple raw files is starting a new task for each of it to re-score each raw file separately.

MarcIsak commented 5 years ago

Thanks for the quick response. I now realize that all RAW-files must be submitted if one is to do the rescoring, as the PROSIT in-silco MS2 spectra must match all the experimental spectra referred to in the msms.txt file right? That could cause you guys some issues in terms of space...

But, if I download PROSIT from Github, I guess it is easy to do this setup with several RAW-files that go into the rescoring? Or is everything performed on your servers?

Best,

Marc

Best,

Marc

gessulat commented 5 years ago

Hi Marc,

you can download the code here and run it locally, but you would need a server with an NVIDIA GPU. Here we only provide the prediction tool though. For the rescoring you would need build prediction based features and run a percolator on that files to rescore them.

vesgadj72 commented 4 years ago

Hi,

I would like to ask for help to use Prosit, I got some raw files of label-free proteomics obtained with a Q-exactive, I ran the raw files one by one in MaxQuant 1.6... then I uploaded the msms.txt generated and the respective .raw file

but then I received the following error when I checked the status

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) Using TensorFlow backend. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/context.py:23: UserWarning: The dask.set_options function has been deprecated. Please use dask.config.set instead warnings.warn("The dask.set_options function has been deprecated. " /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:326: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used head = reader(BytesIO(b_sample), kwargs) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:64: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used df = reader(bio, kwargs) WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) Using TensorFlow backend. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/context.py:23: UserWarning: The dask.set_options function has been deprecated. Please use dask.config.set instead warnings.warn("The dask.set_options function has been deprecated. " /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:326: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used head = reader(BytesIO(b_sample), kwargs) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:64: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used df = reader(bio, kwargs) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/multi.py:393: FutureWarning: 'scan_number' is both an index level and a column label. Defaulting to column, but this will raise an ambiguity error in a future version suffixes=suffixes, indicator=indicator) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/base.py:835: UserWarning: The get= keyword has been deprecated. Please use the scheduler= keyword instead with the name of the desired scheduler like 'threads' or 'processes' warnings.warn("The get= keyword has been deprecated. " /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/local.py:255: FutureWarning: 'scan_number' is both an index level and a column label. Defaulting to column, but this will raise an ambiguity error in a future version return func(*args2) Traceback (most recent call last): File "oktoberfest/annotation.py", line 28, in annotated = feynman.match.augment(merged_f, "yb", 6) File "/root/.pyenv/versions/3.6.0/src/feynman/feynman/match.py", line 111, in augment matches[i] = match(row, ion_types, charge_max) File "/root/.pyenv/versions/3.6.0/src/feynman/feynman/match.py", line 74, in match forward_sum, backward_sum = get_forward_backward(row.modified_sequence[1:-1]) File "/root/.pyenv/versions/3.6.0/src/feynman/feynman/match.py", line 32, in get_forward_backward masses = [constants.AMINO_ACID[a] for a in amino_acids] File "/root/.pyenv/versions/3.6.0/src/feynman/feynman/match.py", line 32, in masses = [constants.AMINO_ACID[a] for a in amino_acids] KeyError: 'M(Oxidation (M)' make: *** [annotation] Error 4

Please help me, what am I doing wrong in the CE calibration.

tkschmidt commented 4 years ago

Hey, Modifications except "M(ox)" are not supported and your msms.txt contains 'M(Oxidation (M)'. Please replace the strings and it should work :)

Best, Tobi

vesgadj72 commented 4 years ago

Thanks a lo Tobias,

I did those changes and effectively that error was solved, but now it throws an error of merging

Error log keyboard_arrow_down WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) Using TensorFlow backend. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/context.py:23: UserWarning: The dask.set_options function has been deprecated. Please use dask.config.set instead warnings.warn("The dask.set_options function has been deprecated. " /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:326: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used head = reader(BytesIO(b_sample), kwargs) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:64: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used df = reader(bio, kwargs) WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) Using TensorFlow backend. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/context.py:23: UserWarning: The dask.set_options function has been deprecated. Please use dask.config.set instead warnings.warn("The dask.set_options function has been deprecated. " /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:326: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used head = reader(BytesIO(b_sample), kwargs) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:64: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used df = reader(bio, kwargs) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/multi.py:393: FutureWarning: 'scan_number' is both an index level and a column label. Defaulting to column, but this will raise an ambiguity error in a future version suffixes=suffixes, indicator=indicator) Traceback (most recent call last): File "oktoberfest/annotation.py", line 26, in merged = feynman.preprocessing.merge(raw, msms_f) File "/root/.pyenv/versions/3.6.0/src/feynman/feynman/preprocessing.py", line 16, in merge df = dask.dataframe.merge(dd_raw, dd_msms).compute() File "/root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/multi.py", line 393, in merge suffixes=suffixes, indicator=indicator) File "/root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/multi.py", line 302, in single_partition_join meta = pd.merge(left._meta_nonempty, right._meta_nonempty, **kwargs) File "/root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 61, in merge validate=validate) File "/root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 555, in init self._maybe_coerce_merge_keys() File "/root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 986, in _maybe_coerce_merge_keys raise ValueError(msg) ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat make: *** [annotation] Error 4

Then I don't know how to solve this one either

tkschmidt commented 4 years ago

Hey, did you in any way change the structure of the MSMS txt? columns were cast from int/double to string? Could happen by chance if you use excel/R/any tool

vesgadj72 commented 4 years ago

Thanks, Tobias,

the problem was solved I get a number for CE in a .txt then I uploaded again for RESCORING

then I get the following error

Error log keyboard_arrow_down .cc:1484] Adding visible gpu devices: 0 2020-04-02 02:11:31.453781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-02 02:11:31.453792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2020-04-02 02:11:31.453801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2020-04-02 02:11:31.453917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15156 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0) 2020-04-02 02:11:31.484708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2020-04-02 02:11:31.484772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-02 02:11:31.484784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2020-04-02 02:11:31.484793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2020-04-02 02:11:31.484911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15156 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0) 2020-04-02 02:11:31.515874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2020-04-02 02:11:31.515934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-02 02:11:31.515945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2020-04-02 02:11:31.515954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2020-04-02 02:11:31.516071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15156 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0) 2020-04-02 02:11:31.546972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2020-04-02 02:11:31.547036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-02 02:11:31.547047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2020-04-02 02:11:31.547056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2020-04-02 02:11:31.547178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15156 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0) Using TensorFlow backend. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) /root/.pyenv/versions/3.6.0/src/prosit/prosit/model.py:39: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config = yaml.load(f) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/engine/saving.py:349: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config = yaml.load(yaml_string) WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. Loading required package: h5 Warning message: In cor(raw[, i], pred[, i], use = "p", method = "p") : the standard deviation is zero WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) Using TensorFlow backend. /root/.pyenv/versions/3.6.0/src/pwyll/pwyll/percolator.py:195: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy df_an["precursor_charge"] = df_an.charge WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. make: *** [percolator] Error 10

I used the same msms.txt of before so the modifications of Methionine were right. However, I don't know if it was the FDR the problem that throws error this time, then I ran again maxquant with FDR of 0.00 (100%) but again Prosit throw an error

Error log keyboard_arrow_down , compute capability: 6.0) 2020-04-02 17:44:53.800069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2020-04-02 17:44:53.800132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-02 17:44:53.800142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2020-04-02 17:44:53.800151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2020-04-02 17:44:53.800257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15156 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0) 2020-04-02 17:44:53.834593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2020-04-02 17:44:53.834684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-02 17:44:53.834715: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2020-04-02 17:44:53.834738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2020-04-02 17:44:53.834939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15156 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0) 2020-04-02 17:44:53.871012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2020-04-02 17:44:53.871100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-02 17:44:53.871134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2020-04-02 17:44:53.871163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2020-04-02 17:44:53.871367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15156 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0) 2020-04-02 17:44:53.907351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2020-04-02 17:44:53.907399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-02 17:44:53.907409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2020-04-02 17:44:53.907417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2020-04-02 17:44:53.907512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15156 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0) 2020-04-02 17:44:53.937205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2020-04-02 17:44:53.937252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-02 17:44:53.937262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2020-04-02 17:44:53.937271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2020-04-02 17:44:53.937372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15156 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0) Using TensorFlow backend. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) /root/.pyenv/versions/3.6.0/src/prosit/prosit/model.py:39: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config = yaml.load(f) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/engine/saving.py:349: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config = yaml.load(yaml_string) WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. Loading required package: h5 Error in apply(seen[meta$reverse, ] == 0, 2, sum, na.rm = T) : dim(X) must have a positive length Calls: source ... withVisible -> eval -> eval -> add_reScore -> apply Execution halted make: *** [features] Error 8