weizhenFrank / DeepPhospho

MIT License
5 stars 2 forks source link

Problem with big prediction input #4

Closed tsveth closed 2 years ago

tsveth commented 2 years ago

Hi developers,

Currently I am trying to run a fairly large input using your provided GUI on a Windows machine. I am using only the predict function, and not the train function.

However, a large input list of peptides returns a "OverflowError: cannot serialize a bytes object larger than 4 GiB"

Except for cutting the list into smaller files, is there something I can do to upload the file in one go?

Thanks a lot!!

image

gureann commented 2 years ago

Hi @tsveth,

Thanks for your use of DeepPhospho and report this problem

It took me some time to reproduce it but unfortunately I didn't get the same error OverflowError as you shown. While I could get a memory error or os error when a quite large input file was used and DeepPhospho was run on windows platform with cpu I guess these errors might be caused by similar problems Could you please modify a number in this file and try to run your task again: in /deep_phosppho/train_pred_utils/ion_pred.py, line 106, num_workers=2, replace this 2 to 0

If it's still a problem, I'd like to know more information about your machine, for the OS and its version, the total and generally available RAM, and the number of precursors in your input file

One more word, I got another error caused by the long path used in Windows in the test computer. So I'd suggest you to check if the \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\LongPathsEnabled in regedit is set to 1

Best, Ronghui

tsveth commented 2 years ago

Dear Ronghui,

Thanks a lot for the suggestion. Replacing the num_workers to 0 did the trick. The software is now running smoothly!

While we're on the topic of large peptide lists as input. Are there specific settings that can be tweaked for faster processing times? I am trying to get the full proteome digested and this takes a significant amount of time.

Thanks again!

gureann commented 2 years ago

Hi @tsveth ,

Glad to hear it works.

But I think it's not easy to speed up the whole processing because DeepPhospho is designed as a relatively large model (compared with some others). Though, as I know, there are some steps in whole workflow can be optimized, while the most time-consuming step to predict the spectra and RT for peptides is highly depended on the hardware.

In our test, it might cost ~25min and ~17min to predict spectra and RT for 1 million peptides with a 2080ti, but it will increase to ~6h to do same on cpu with 8 cores and ~4.5GHz. And the time would show a linear increase as the input increases

I'd highly recommand to run with a GPU if it's available. Else a cpu with lots of cores would also help a little

If it will really take much time for you, I'd also like to suggest some lighter models like pDeep3 and alphapeptdeep. As I know, they also support phosphopeptide and library generation. But I'm not sure if they are easy to setup (I have used pDeep2 before and the prediction performance is also good, and you can also refer to this repo if you are interested in a combination of pDeep2 and DeepRTPlus) Hope this would be helpful

Best, Ronghui

tsveth commented 2 years ago

Hi Ronghui,

Great! Thanks a lot for the very nice information!