Slightly better results with .mzML than .raw format from the same file

rolivella commented 1 year ago

Hi again!

We tested DIANN 1.8 (by command line in linux) with the same file and observed different results depending on if we used .mzML or .raw files. For instance, this is the number of precursors identified and sequence overlap for the 001.raw file compared to 001.mzML:

thumbnail_image002

Do you know the reason for this variance?

We also observed that systematically get around 15% more identifications by using .mzML.

The files were converted by https://github.com/compomics/ThermoRawFileParser with this command line:

ThermoRawFileParser.sh -i=001.raw -f=2 -o ./

If you want I can share the original files by private message.

Thanks!

vdemichev commented 1 year ago

The mzML contains profile data, while DIA-NN reads centroided from .raw?

rolivella commented 1 year ago

The conversion tool automatically converts to mzML centroided because I did not specified this option:

-p, --noPeakPicking[=VALUE]
                             Don't use the peak picking provided by the native
                               Thermo library. By default peak picking is
                               enabled. Optional argument allows disabling peak
                               peaking only for selected MS levels and should
                               be a comma-separated list of integers (1,2,3)
                               and/or intervals (1-3), open-end intervals (1-)
                               are allowed

So By default peak picking is enabled

vdemichev commented 1 year ago

Maybe a different algorithm than used by the Windows Thermo .dll?

Anyway, please also try with 1.8.1, 1.8 had some bugs on Linux that were fixed in 1.8.1

edunivers commented 1 year ago

Is DIANN expecting centroid .raw files (MS1 and MS2)? Or it does not care whether Thermo .raw files are centroid or profile.

vdemichev commented 1 year ago

When .raw files are profile, the Thermo .dll centroids them, so what DIA-NN gets is always centroided spectra.

ffullomicscouts commented 1 year ago

Hi,

I did some similar tests with .raw, .raw.dia and .mzML running on Windows 10, Linux and Linux+Wine since everything should be run in a docker container. There were also some differences between the different conversions.

Here are the result tables:

For this we need to first install wine (done in dockerfile) then install DIA-NN and MSFileReader from Thermo (Version 3.0 SP2)
NOTE: The rawfile to dia in linux+wine convert step only works with --threads 1
R1-3 are 3 different raw files All linux tests used the Diann native version mzML were created with ThermoRawFileParser

R1	Mode	Precursors.Identified	Proteins.Identified
win thermo raw	65337	6555	6.17434e+11
linux dia	65337	6555	6.17434e+11
win dia	65337	6555	6.17434e+11
linux noise mzML	65074	6583	6.16577e+11
win noise mzML	65074	6620	6.16576e+11
linux mzML	65074	6583	6.16577e+11
win mzML	65074	6620	6.16576e+11

R2	Mode	Precursors.Identified	Proteins.Identified
win thermo raw	65312	6628	6.28592e+11
linux dia	65312	6628	6.28592e+11
win dia	65312	6628	6.28592e+11
linux noise mzML	65048	6565	6.26993e+11
win noise mzML	65052	6568	6.26995e+11
linux mzML	65048	6565	6.26993e+11
win mzML	65052	6568	6.26995e+11

R3	Mode	Precursors.Identified	Proteins.Identified
win thermo raw	65105	6579	6.41853e+11
linux dia	65105	6579	6.41853e+11
win dia	65105	6579	6.41853e+11
linux noise mzML	65122	6569	6.40738e+11
win noise mzML	65130	6557	6.4062e+11
linux mzML	65122	6569	6.40738e+11
win mzML	65130	6557	6.4062e+11

vdemichev / DiaNN

Slightly better results with .mzML than .raw format from the same file #659