vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
283 stars 53 forks source link

Different IDs with each run( same files/ same fasta) #1205

Open Maiatallah opened 1 month ago

Maiatallah commented 1 month ago

Hii!!

So, I had a couple of times to repeat my data processing for a few runs and it was different each single time? They were the same files processed the previous time! Same fasta as well

Now, I had to run files but added more new files, I ended up having significantly different numbers. I just want to know why and if there are any recommendations to get a reproducible IDs. How many runs should be processed in one go? If there is any other criteria for combining some runs together and why they might affect the IDs?

vdemichev commented 1 month ago

Hi,

Can you please share two logs (at log level 3) corresponding to such analyses with identical settings but different results?

Now, I had to run files but added more new files, I ended up having significantly different numbers.

That's perfectly expected and can be due to a variety of reasons, from MBR to automatic optimisation of mass accuracies and scan window or various cross-run quantification algorithms used by DIA-NN.

How many runs should be processed in one go?

Usually all of them.

Please see the section of the FAQ on incremental processing, if the goal is to avoid changing any numbers or quantities when adding extra runs (which is only needed in very special cases, like patient samples gradually arriving over years): https://github.com/vdemichev/DiaNN?tab=readme-ov-file#frequently-asked-questions

Best, Vadim

Maiatallah commented 1 month ago

Thanks for ur reply! I think, I will not be able to find the log files right away for now! but for now, do you suggest processing files the belong to QC for instance, or sample files together instead of combining negative control and sample files? in terms of what is the best for DIANN algorithm.

Maiatallah commented 1 month ago

How to get the precursor ion m/z and mass from output file? As I try to find the ID of some specific peaks in the raw file.

vdemichev commented 1 month ago

processing files the belong to QC for instance, or sample files together

Yes. The only thing better processed separately is blanks.

How to get the precursor ion m/z and mass from output file?

Reported in the main .parquet report by DIA-NN 1.9.1. You can also use DIA-NN Viewer to see how exactly DIA-NN sees the peptide-spectrum matches.

Best, Vadim