Protein ID Detection At 50%

Khrabdee commented 10 months ago

Dear Vdemichev, I am using Waters Synapt G2 to do IMS proteomics experiment. I have converted the raw files using proteowizard and I combined the ion mobility scans. I started by generating the spectral library, and I used different parameters from the default ones, which I saw fit.

I then started running one file to test whether it works or not. In the log it says "7184065 library precursors are potentially detectable".

Later on, it says Number of IDs at 50% , 5%, 1%, 0.1% FDR is 10, 0, 0 ,0.

After the whole process is done, the report only showed two proteins.

I am unsure where I am going wrong. Do you know how to reduce the FDR detection? Also, how to increase the total number of proteins detected.

Thanks in advance for your help!!

vdemichev commented 9 months ago

Sorry, I personally have never worked with Waters data. I would think ion mobility Waters data is likely not properly supported, so DIA-NN just thinks the data is in one format when it fact it's in the other (.mzML files can be very different). So it's unfortunately unlikely for this to work.

Best, Vadim

RedOctoCat commented 8 months ago

Hi @vdemichev :) Waters data can be tricky!

Is the Waters data that doesn't have ion mobility supported by DIA-NN? Any tips for what you would input, to see if it works?

This is how I set things up - what do you think? My understanding is:

Step 1 - Convert to mzML. When the Waters ion mobility data is processed by Proteowizard's MsConvert, the MsConvert log states that the ion mobility data cannot be centroided. What effect on DIA-NN's processing of Waters' ion mobility data, or just data in general, would that have?

Step 2 - Use DIA-NN to change the RAW format to .dia using DIA-NN's "Convert to .dia" file. I use max threads for this and subsequent runs, ensure output is to writing to the HDD, etc.

Step 3 - Add FASTA known to link to the RAW file, and utilise "FASTA digest for library-free search / library generation" which also ticks, "Deep learning spectra, RTs and IMs prediction" etc. Don't change anything else.

Step 4 - Change the algorithm to suit. I was using "protein inference = genes" because it was the default but now am trying "Protein names (from FASTA)" since I was getting 0 results. Using Ultra-fast as it takes a long time on the PC I'm using.

Run, and wait for it to work. How does DIA-NN work with the PC? It seems to be very processor heavy?

I'm curious as to why Waters files aren't supported, and what you might suggest for those of us who use it.

Thanks!

vdemichev / DiaNN

Protein ID Detection At 50% #794