statisticalbiotechnology / triqler

The triqler (TRansparent Identification-Quantification-linked Error Rates)'s source and example code
Apache License 2.0
19 stars 9 forks source link

diann2triqler generates incomplete input data #22

Open tobiasko opened 1 year ago

tobiasko commented 1 year ago

Hi triqler developers,

first of all: Thanks for adding an import function for DIA-NN main reports. I tried using it by:

diann2triqler --file_list_file ~/Documents/anno.tsv --out_file ~/Documents/tmp/triqler_input.tsv ~/Downloads/2271000/WU287354/out-2023-03-28/diann-output.tsv
triqler.convert.diann version None
Copyright (c) 2018-2023 Matthew The, Patrick Truong. All rights reserved.
Written by:
- Matthew The (matthew.the@scilifelab.se)
- Patrick Truong (patrick.truong@scilifelab.se)
in the School of Engineering Sciences in Chemistry, Biotechnology and Health
at the Royal Institute of Technology in Stockholm.
Issued command: diann.py --file_list_file /Users/tobiasko/Documents/anno.tsv --out_file /Users/tobiasko/Documents/tmp/triqler_input.tsv /Users/tobiasko/Downloads/2271000/WU287354/out-2023-03-28/diann-output.tsv

but got an output that is incomplete (run and condition column is empty):

head /Users/tobiasko/Documents/tmp/triqler_input.tsv
run condition   charge  searchScore intensity   peptide proteins
        3   8.277991268693333   16392100.0  AAAAAAAAAPAAAATAPTTAATTAATAAQ   P37108
        3   5.870511430361334   46752400.0  AAAAAAAAAPAAAATAPTTAATTAATAAQ   P37108
        3   5.619801685261617   47872800.0  AAAAAAAAAPAAAATAPTTAATTAATAAQ   P37108
        3   5.660269065244499   68029000.0  AAAAAAAAAPAAAATAPTTAATTAATAAQ   P37108
        3   8.409200143913043   68997300.0  AAAAAAAAAPAAAATAPTTAATTAATAAQ   P37108
        4   12.102047821511396  1022760.0   AAAAAAAAAPAAAATAPTTAATTAATAAQ   P37108
        4   10.493966237675412  1983760.0   AAAAAAAAAPAAAATAPTTAATTAATAAQ   P37108
        4   8.790321682860508   1721600.0   AAAAAAAAAPAAAATAPTTAATTAATAAQ   P37108
        4   9.735068860911166   2366670.0   AAAAAAAAAPAAAATAPTTAATTAATAAQ   P37108

my annotation files looks like:

head ~/Documents/anno.tsv
20230307_002_S468675_negControl_Control.mzML    a
20230307_003_S468676_Probe1_Group_1.mzML    b
20230307_004_S468677_Probe2_Group_2.mzML    c
20230307_005_S478625_Probe3_Group_3.mzML    d
20230307_006_S478626_Probe4_Group_4.mzML    e

the DIA-NN main report looks like:

head /Users/tobiasko/Downloads/2271000/WU287354/out-2023-03-28/diann-output.tsv
File.Name   Run Protein.Group   Protein.Ids Protein.Names   Genes   PG.Quantity PG.Normalised   PG.MaxLFQ   Genes.Quantity  Genes.Normalised    Genes.MaxLFQ    Genes.MaxLFQ.Unique Modified.Sequence   Stripped.Sequence   Precursor.Id    Precursor.Charge    Q.Value PEP Global.Q.Value  Protein.Q.Value PG.Q.Value  Global.PG.Q.Value   GG.Q.Value  Translated.Q.Value  Proteotypic Precursor.Quantity  Precursor.Normalised    Precursor.Translated    Translated.Quality  Ms1.Translated  Quantity.Quality    RT  RT.Start    RT.Stop iRT Predicted.RT    Predicted.iRT   First.Protein.Description   Lib.Q.Value Lib.PG.Q.Value  Ms1.Profile.Corr    Ms1.Area    Evidence    Spectrum.Similarity Averagine   Mass.Evidence   CScore  Decoy.Evidence  Decoy.CScore    Fragment.Quant.Raw  Fragment.Quant.Corrected    Fragment.Correlations   MS2.Scan    IM  iIM Predicted.IM    Predicted.iIM
/scratch/DIANN_A314/WU287354/20230307_002_S468675_negControl_Control.mzML   20230307_002_S468675_negControl_Control P37108  P37108  SRP14_HUMAN SRP14   1.63921e+07 1.10164e+07 1.44463e+07 1.63921e+07 1.10164e+07 1.44463e+07 1.44463e+07 AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ3  3   0.000254047 0.00751629  0.0010626   0.000270343 0.000267666 0.000234907 0.000268384 0   1   1.63921e+07 1.10164e+07 1.21446e+07     6.5771e+07  0.979711    32.8786 32.7155 33.044  31.1751 32.9558 30.9749 Signal recognition particle 14 kDa protein  0.0001731   0.000210526 0.996264    8.87744e+07 6.5037  0.844827    1   0   0.99422 1.46965 0.0158835   6.60174e+06;6.98001e+06;6.2017e+06;5.55096e+06;5.53627e+06;4.2541e+06;3.15814e+06;1.05764e+06;786330;1.02731e+06;603819;553297; 6.60174e+06;6.98001e+06;6.2017e+06;5.55096e+06;5.53627e+06;4.2541e+06;3.15814e+06;1.05764e+06;786330;1.02731e+06;603819;553297; 0.977612;0.977969;0.973274;0.972621;0.983241;0.978374;0.966172;0.969758;0.985712;0.942823;0.930083;0.959734;    20634   0   0   0   0
/scratch/DIANN_A314/WU287354/20230307_003_S468676_Probe1_Group_1.mzML   20230307_003_S468676_Probe1_Group_1 P37108  P37108  SRP14_HUMAN SRP14   4.67524e+07 1.7241e+07  1.92563e+07 4.67524e+07 1.7241e+07  1.92563e+07 1.92563e+07 AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ3  3   0.00282143  0.196143    0.0010626   0.000221533 0.000253485 0.0002349070.000254259  0   1   4.67524e+07 1.7241e+07  2.72102e+07     1.22967e+08 0.991776    33.5899 33.4294 33.7503 31.1751 33.577  31.1972 Signal recognition particle 14 kDa protein  0.0001731   0.000210526 0.998894    2.11282e+08 6.85246 0.827158    1   0.862182    1.38556 0.00829299  1.94406e+07;1.91949e+07;1.80756e+07;1.60994e+07;1.52083e+07;1.21035e+07;8.76118e+06;2.71786e+06;2.27849e+06;2.14646e+06;2.08095e+06;1.78959e+06;    1.94406e+07;1.91949e+07;1.80756e+07;1.60994e+07;1.52083e+07;1.21035e+07;8.76118e+06;2.71786e+06;2.27849e+06;2.14646e+06;2.08095e+06;1.78959e+06;    0.992206;0.991059;0.992022;0.991335;0.991873;0.990964;0.992621;0.990091;0.991055;0.994416;0.985636;0.993594;    21229   0   0   0
/scratch/DIANN_A314/WU287354/20230307_004_S468677_Probe2_Group_2.mzML   20230307_004_S468677_Probe2_Group_2 P37108  P37108  SRP14_HUMAN SRP14   4.78728e+07 7.45569e+07 6.67804e+07 4.78728e+07 7.45569e+07 6.67804e+07 6.67804e+07 AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ3  3   0.00362536  0.0622554   0.0010626   0.000282566 0.000365497 0.0002349070.000366838  0   1   4.78728e+07 7.45569e+07 4.651e+07       2.06901e+08 0.988583    33.3337 33.169  33.4967 31.1751 33.5277 30.592  Signal recognition particle 14 kDa protein  0.0001731   0.000210526 0.994478    2.12963e+08 6.76631 0.82038 1   0   .952574 1.95552 0.0009926   1.91125e+07;2.01564e+07;1.85404e+07;1.64623e+07;1.59419e+07;1.28183e+07;8.78569e+06;2.86711e+06;2.19047e+06;2.03757e+06;2.09137e+06;1.94675e+06;    1.91125e+07;2.01564e+07;1.85404e+07;1.64623e+07;1.59419e+07;1.28183e+07;8.78569e+06;2.86711e+06;2.19047e+06;2.03757e+06;2.09137e+06;1.94675e+06;    0.988607;0.987123;0.988697;0.987569;0.988235;0.98898;0.986544;0.986101;0.984394;0.979939;0.969332;0.992309; 20809   0   0   0   
/scratch/DIANN_A314/WU287354/20230307_005_S478625_Probe3_Group_3.mzML   20230307_005_S478625_Probe3_Group_3 P37108  P37108  SRP14_HUMAN SRP14   6.8029e+07  9.99422e+07 9.59945e+07 6.8029e+07  9.99422e+07 9.59945e+07 9.59945e+07 AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ3  3   0.00348158  0.124682    0.0010626   0.000315259 0.000332889 0.000234907 0.00033389  0   1   6.8029e+07  9.99422e+07 7.12208e+07     3.43828e+08 0.99249 32.7995 32.6347 32.9646 31.1751 32.9432 30.7871 Signal recognition particle 14 kDa protein  0.0001731   0.000210526 0.997556    3.28419e+08 6.85753 0.851211    1   0   0.916983    .12423  0.00610422  2.74179e+07;2.87289e+07;2.66591e+07;2.41888e+07;2.22551e+07;1.8356e+07;1.29656e+07;4.26316e+06;3.57073e+06;3.03517e+06;3.13019e+06;2.62147e+06; 2.74179e+07;2.87289e+07;2.66591e+07;2.41888e+07;2.22551e+07;1.8356e+07;1.29656e+07;4.26316e+06;3.57073e+06;3.03517e+06;3.13019e+06;2.62147e+06; 0.992969;0.990732;0.991277;0.990546;0.991302;0.993214;0.991907;0.98782;0.985711;0.99099;0.990468;0.992394;  20389   0   0   0   0
/scratch/DIANN_A314/WU287354/20230307_006_S478626_Probe4_Group_4.mzML   20230307_006_S478626_Probe4_Group_4 P37108  P37108  SRP14_HUMAN SRP14   6.89973e+07 1.30697e+08 1.05691e+08 6.89973e+07 1.30697e+08 1.05691e+08 1.05691e+08 AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ3  3   0.000222808 0.00345861  0.0010626   0.00030303  0.000333667 0.0002349070.000334896  0   1   6.89973e+07 1.30697e+08 9.56483e+07     4.63718e+08 0.988966    33.6628 33.498  33.8296 31.1751 33.7499 30.9343 Signal recognition particle 14 kDa protein  0.0001731   0.000210526 0.994202    3.3451e+08  6.81443 0.839224    1   0.997559    0.392247    0.00268637  2.78639e+07;2.82421e+07;2.69313e+07;2.40967e+07;2.26158e+07;1.85177e+07;1.30993e+07;4.03366e+06;3.48445e+06;3.13427e+06;3.10429e+06;2.52782e+06;    2.78639e+07;2.82421e+07;2.69313e+07;2.40967e+07;2.26158e+07;1.85177e+07;1.30993e+07;4.03366e+06;3.48445e+06;3.13427e+06;3.10429e+06;2.52782e+06;    0.988595;0.988238;0.987579;0.988575;0.989639;0.988703;0.988102;0.988732;0.986705;0.985203;0.981841;0.983654;    20949   0   0   0
/scratch/DIANN_A314/WU287354/20230307_002_S468675_negControl_Control.mzML   20230307_002_S468675_negControl_Control P37108  P37108  SRP14_HUMAN SRP14   1.63921e+07 1.10164e+07 1.44463e+07 1.63921e+07 1.10164e+07 1.44463e+07 1.44463e+07 AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ4  4   5.54814e-06 5.54814e-06 3.12452e-05 0.000270343 0.000267666 0.000234907 0.000268384 0   1   1.02276e+06 690802  761544      921691  0.992176    32.8639 32.7007 33.0286 30.9212 32.8601 30.9357 Signal recognition particle 14 kDa protein  2.72478e-06 0.000210526 0.839792    1.23784e+06 6.22773 0.770542    1   0   .999996 1.46478 0.184098    404970;340150;277644;192080;213806;117049;144243;37453.2;44792.4;31578;11767.2;0;   404970;340150;277644;192080;213806;117049;144243;37453.2;44792.4;31578;11767.2;0;   0.988866;0.994455;0.994211;0.988567;0.96821;0.973227;0.992425;0.948632;0.934873;0.906882;0.530497;0;    20624   0   0   0   0
/scratch/DIANN_A314/WU287354/20230307_003_S468676_Probe1_Group_1.mzML   20230307_003_S468676_Probe1_Group_1 P37108  P37108  SRP14_HUMAN SRP14   4.67524e+07 1.7241e+07  1.92563e+07 4.67524e+07 1.7241e+07  1.92563e+07 1.92563e+07 AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ4  4   2.77031e-05 3.68629e-05 3.12452e-05 0.000221533 0.000253485 0.0002349070.000254259  0   1   1.98376e+06 731558  1.15456e+06     2.11654e+06 0.983917    33.5752 33.415  33.7353 30.9212 33.4836 31.1572 Signal recognition particle 14 kDa protein  2.72478e-06 0.000210526 0.961882    3.63662e+06 5.62774 0.775991    1   0   .99997  1.6614  0.134911    889733;601818;492209;148091;244832;230450;162182;71635;202073;55980.6;57073.7;38331.1;  889733;601818;492209;148091;244832;230450;162182;71635;202073;55980.6;57073.7;38331.1;  0.982358;0.996716;0.971085;0.950065;0.941739;0.98919;0.966018;0.921609;0.845562;0.939;0.932851;0.838247;    21219   0   0   0   0
/scratch/DIANN_A314/WU287354/20230307_004_S468677_Probe2_Group_2.mzML   20230307_004_S468677_Probe2_Group_2 P37108  P37108  SRP14_HUMAN SRP14   4.78728e+07 7.45569e+07 6.67804e+07 4.78728e+07 7.45569e+07 6.67804e+07 6.67804e+07 AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ4  4   0.000152199 0.000182927 3.12452e-05 0.000282566 0.000365497 0.0002349070.000366838  0   1   1.7216e+06  2.64235e+06 1.64835e+06     4.93663e+06 0.981658    33.3189 33.1535 33.4815 30.9212 33.4417 30.5483 Signal recognition particle 14 kDa protein  2.72478e-06 0.000210526 0.990952    5.15601e+06 6.22496 0.865859    1   0.999854    1.56904 0.158621    651915;548736;520951;347621;283782;196978;131609;57940.3;51600.4;60054.5;35806.7;16675.5;   651915;548736;520951;347621;283782;196978;131609;57940.3;51600.4;60054.5;35806.7;16675.5;   0.982183;0.988121;0.974193;0.980292;0.978389;0.953172;0.965087;0.978532;0.957752;0.974809;0.925948;0.645829;    20799   0   0   0   0
/scratch/DIANN_A314/WU287354/20230307_005_S478625_Probe3_Group_3.mzML   20230307_005_S478625_Probe3_Group_3 P37108  P37108  SRP14_HUMAN SRP14   6.8029e+07  9.99422e+07 9.59945e+07 6.8029e+07  9.99422e+07 9.59945e+07 9.59945e+07 AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ   AAAAAAAAAPAAAATAPTTAATTAATAAQ4  4   5.91716e-05 9.19448e-05 3.12452e-05 0.000315259 0.000332889 0.000234907 0.00033389  0   1   2.36667e+06 3.49189e+06 2.48839e+06     5.57617e+06 0.992525    32.8391 32.6741 33.0046 30.9212 32.8492 30.8939 Signal recognition particle 14 kDa protein  2.72478e-06 0.000210526 0.975219    5.30342e+06 6.6932  0.784224    1   0   .999933 1.21275 0.0746513   917976;767804;680893;569621;476183;377316;239178;85433.4;78361.1;71348.6;77283.3;63650.4;   917976;767804;680893;569621;476183;377316;239178;85433.4;78361.1;71348.6;77283.3;63650.4;   0.993232;0.991838;0.992345;0.976425;0.981981;0.976568;0.98792;0.976306;0.961929;0.962201;0.956322;0.937639; 20414   0   0   0   0

Any idea what is going wrong here?

MatthewThe commented 1 year ago

It's probably because there are still file extensions in the mapping file. Can you try removing the .mzML extensions?

Nevertheless, we should fix the converter to strip off the extensions or give a better error message.

tobiasko commented 1 year ago

👍 removing the .mzML extension fixed the problem.