wilhelm-lab / oktoberfest

Rescoring and spectral library generation pipeline for proteomics.
MIT License
31 stars 8 forks source link

CE calibrating FragPipe results generated from mzML formatted data fails #129

Closed tobiasko closed 8 months ago

tobiasko commented 11 months ago

Describe the bug

python3 run_oktoberfest.py --config_path ~/CEcalibration_config.json
2023-09-22 13:04:50,976 - INFO - oktoberfest::main Oktoberfest version 0.4.0
Copyright (c) 2020-2021 Oktoberfest dev-team. All rights reserved.
Written by
- Wassim Gabriel (wassim.gabriel@tum.de),
- Ludwig Lautenbacher (ludwig.lautenbacher@tum.de),
- Matthew The (matthew.the@tum.de),
- Mario Picciani (mario.picciani@in.tum.de),
- Firas Hamood (firas.hamood@tum.de),
- Cecilia Jensen (cecilia.jensen@tum.de)
at the Technical University of Munich.
2023-09-22 13:04:50,976 - INFO - oktoberfest::main Issued command: run_oktoberfest.py --config_path /home/tobiasko/CEcalibration_config.json
2023-09-22 13:04:50,977 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-09-22 13:04:50,978 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-09-22 13:04:50,978 - INFO - oktoberfest.ce_calibration::_load_search search_type is msfragger
2023-09-22 13:04:50,978 - INFO - oktoberfest.ce_calibration::_gen_internal_search_result_from_msms Converting msms data at /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.pepXML to internal search result.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:29<00:00, 29.75s/it]
2023-09-22 13:05:21,208 - INFO - spectrum_io.search_result.search_results::filter_valid_prosit_sequences #sequences before filtering for valid prosit sequences: 101264
2023-09-22 13:05:21,354 - INFO - spectrum_io.search_result.search_results::filter_valid_prosit_sequences #sequences after filtering for valid prosit sequences: 98622
2023-09-22 13:05:22,036 - INFO - oktoberfest.ce_calibration::perform_alignment Path to hdf5 file with annotations for /scratch/tobiasko: /scratch/tobiasko/data/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML.hdf5
2023-09-22 13:05:22,037 - INFO - oktoberfest.ce_calibration::_load_rawfile raw_type is mzml
Traceback (most recent call last):
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/run_oktoberfest.py", line 34, in <module>
    main()
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/run_oktoberfest.py", line 30, in main
    runner.run_job(args.config_path)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 209, in run_job
    run_ce_calibration(msms_path, search_dir, config_path, glob_pattern, output_path)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 150, in run_ce_calibration
    ce_calib.perform_alignment(ce_calib._load_search())
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/ce_calibration.py", line 215, in perform_alignment
    self.merge_mzml_and_msms(df_search)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/ce_calibration.py", line 116, in merge_mzml_and_msms
    df_raw = self._load_rawfile()
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/ce_calibration.py", line 104, in _load_rawfile
    return ThermoRaw.read_mzml(
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 148, in read_mzml
    mass_analyzer = get_mass_analyzer(file_path)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 95, in get_mass_analyzer
    return check_analyzer(mass_analyzers)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 34, in check_analyzer
    raise AssertionError(f"The mass analyzer with accession {accession} is not supported.")
AssertionError: The mass analyzer with accession MS:1000081 is not supported.

The above mzML file was generated by MSconvert (Docker container) on Debian Linux with parameters --mzML --64 --zlib --filter "peakPicking true 1-

To Reproduce

{
    "type": "CollisionEnergyCalibration",
    "tag": "",
    "allFeatures": false,
    "inputs": {
        "search_results_type": "Msfragger",
        "spectra": "/scratch/cpanse/PXD028735/dda/",
        "spectra_type": "mzml",
        "search_results": "/scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.pepXML"
    },
    "fastaDigestOptions": {
        "fragmentation": "HCD",
        "digestion": "full",
        "missedCleavages": 0,
        "minLength": 7,
        "maxLength": 30,
        "enzyme": "trypsin",
        "specialAas": "KR",
        "db": "target"
    },
    "models": {
        "intensity": "Prosit_2020_intensity_HCD",
        "irt": "Prosit_2019_irt"
    },
    "output": "/scratch/tobiasko/",
    "outputFormat": "spectronaut",
    "prediction_server": "koina.proteomicsdb.org:443",
    "ssl": true,
    "numThreads": 3,
    "fdr_estimation_method": "mokapot",
    "regressionMethod": "spline",
    "thermoExe": "ThermoRawFileParser.exe",
    "massTolerance": 20,
    "unitMassTolerance": "ppm"
}

Expected behavior

no error complaining about unsupported mass analysers.

System [please complete the following information]:

python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
picciama commented 11 months ago

I published a hotfix release for spectrum-io (v0.3.3) because it was only there to check if we have default values for the mass tolerance and unit. As long as you supply these yourself, it should be fine. If you install the newest release of oktoberfest (v0.5.0), this error should be gone. The release will be published tonight and the issue will be closed accordingly. Please reopen should you still encounter the problem.

tobiasko commented 11 months ago

Hi @picciama,

I updated to v 0.5.0 and get another error:

python3 -m oktoberfest --config_path ~/CEcalibration_config.json
2023-10-04 11:07:54,458 - INFO - oktoberfest::main Oktoberfest version 0.5.0
Copyright 2023, Wilhelmlab at Technical University of Munich
2023-10-04 11:07:54,460 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-10-04 11:07:54,471 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-10-04 11:07:54,472 - INFO - oktoberfest.runner::run_ce_calibration Found 45 files in the spectra directory.
2023-10-04 11:07:54,473 - INFO - oktoberfest.runner::_preprocess Converting search results from /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.pepXML to internal search result.
2023-10-04 11:07:54,473 - INFO - spectrum_io.search_result.search_results::generate_internal Found search results in internal format at /scratch/tobiasko/msms/msms.prosit, skipping conversion
2023-10-04 11:07:54,650 - INFO - oktoberfest.runner::_preprocess Read 98622 PSMs from /scratch/tobiasko/msms/msms.prosit
2023-10-04 11:07:54,771 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/tobiasko/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.rescore
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /usr/lib/python3.9/runpy.py:197 in _run_module_as_main                                           │
│                                                                                                  │
│   194 │   main_globals = sys.modules["__main__"].__dict__                                        │
│   195 │   if alter_argv:                                                                         │
│   196 │   │   sys.argv[0] = mod_spec.origin                                                      │
│ ❱ 197 │   return _run_code(code, main_globals, None,                                             │
│   198 │   │   │   │   │    "__main__", mod_spec)                                                 │
│   199                                                                                            │
│   200 def run_module(mod_name, init_globals=None,                                                │
│                                                                                                  │
│ /usr/lib/python3.9/runpy.py:87 in _run_code                                                      │
│                                                                                                  │
│    84 │   │   │   │   │      __loader__ = loader,                                                │
│    85 │   │   │   │   │      __package__ = pkg_name,                                             │
│    86 │   │   │   │   │      __spec__ = mod_spec)                                                │
│ ❱  87 │   exec(code, run_globals)                                                                │
│    88 │   return run_globals                                                                     │
│    89                                                                                            │
│    90 def _run_module_code(code, init_globals=None,                                              │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/__main__.py:39 in         │
│ <module>                                                                                         │
│                                                                                                  │
│   36                                                                                             │
│   37 if __name__ == "__main__":                                                                  │
│   38 │   traceback.install()                                                                     │
│ ❱ 39 │   main()  # pragma: no cover                                                              │
│   40                                                                                             │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/__main__.py:34 in main    │
│                                                                                                  │
│   31 │   logger.info(f"Oktoberfest version {__version__}\n{__copyright__}")                      │
│   32 │                                                                                           │
│   33 │   args = _parse_args()                                                                    │
│ ❱ 34 │   runner.run_job(args.config_path)                                                        │
│   35                                                                                             │
│   36                                                                                             │
│   37 if __name__ == "__main__":                                                                  │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py:366 in run_job  │
│                                                                                                  │
│   363 │   if job_type == "SpectralLibraryGeneration":                                            │
│   364 │   │   generate_spectral_lib(config_path)                                                 │
│   365 │   elif job_type == "CollisionEnergyCalibration":                                         │
│ ❱ 366 │   │   run_ce_calibration(config_path)                                                    │
│   367 │   elif job_type == "Rescoring":                                                          │
│   368 │   │   run_rescoring(config_path)                                                         │
│   369 │   else:                                                                                  │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py:229 in          │
│ run_ce_calibration                                                                               │
│                                                                                                  │
│   226 │   proc_dir = config.output / "proc"                                                      │
│   227 │   proc_dir.mkdir(parents=True, exist_ok=True)                                            │
│   228 │                                                                                          │
│ ❱ 229 │   _preprocess(spectra_files, config)                                                     │
│   230 │                                                                                          │
│   231 │   processing_pool = JobPool(processes=config.num_threads)                                │
│   232                                                                                            │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py:45 in           │
│ _preprocess                                                                                      │
│                                                                                                  │
│    42 │   │   search_results = pp.filter_peptides_for_model(peptides=search_results, model=con   │
│    43 │   │                                                                                      │
│    44 │   │   # split search results                                                             │
│ ❱  45 │   │   pp.split_search(                                                                   │
│    46 │   │   │   search_results=search_results,                                                 │
│    47 │   │   │   output_dir=config.output / "msms",                                             │
│    48 │   │   │   filenames=[spectra_file.stem for spectra_file in spectra_files],               │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/preprocessing/preprocessi │
│ ng.py:314 in split_search                                                                        │
│                                                                                                  │
│   311 │   for filename in filenames:                                                             │
│   312 │   │   output_file = (output_dir / filename).with_suffix(".rescore")                      │
│   313 │   │   logger.info(f"Creating split msms.txt file {output_file}")                         │
│ ❱ 314 │   │   grouped_search_results.get_group(filename).to_csv(output_file)                     │
│   315                                                                                            │
│   316                                                                                            │
│   317 def merge_spectra_and_peptides(spectra: pd.DataFrame, search: pd.DataFrame) -> Spectra:    │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/pandas/core/groupby/groupby.py:817 in │
│ get_group                                                                                        │
│                                                                                                  │
│    814 │   │                                                                                     │
│    815 │   │   inds = self._get_index(name)                                                      │
│    816 │   │   if not len(inds):                                                                 │
│ ❱  817 │   │   │   raise KeyError(name)                                                          │
│    818 │   │                                                                                     │
│    819 │   │   return obj._take_with_is_copy(inds, axis=self.axis)                               │
│    820                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02'

the msms.prosit file looks like:

head /scratch/tobiasko/msms/msms.prosit
RAW_FILE,SCAN_NUMBER,MODIFIED_SEQUENCE,PRECURSOR_CHARGE,SCAN_EVENT_NUMBER,MASS,SCORE,REVERSE,SEQUENCE,PEPTIDE_LENGTH
LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01,5923,HGSNIEAM[UNIMOD:35]SK,2,10,1088.4927,11.824,False,HGSNIEAMSK,10
LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01,6009,HVGDM[UNIMOD:35]GNVK,2,13,971.4494,10.718,False,HVGDMGNVK,9
LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01,6014,VSGTLDTPEK,3,14,1048.5232,10.549,False,VSGTLDTPEK,10
LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01,6074,HGSNIEAM[UNIMOD:35]SK,2,15,1088.4928,22.052,False,HGSNIEAMSK,10
LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01,6162,HVDMVLEK,2,22,970.5026,10.51,True,HVDMVLEK,8
LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01,6165,HVGDM[UNIMOD:35]GNVK,2,23,971.4493,32.305,False,HVGDMGNVK,9
LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01,6300,GAHLPHK,2,32,760.4279,10.492,True,GAHLPHK,7
LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01,6309,AAHDNM[UNIMOD:35]DIDK,3,34,1144.4819,13.032,False,AAHDNMDIDK,10
LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01,6319,VIAHTQM[UNIMOD:35]R,2,36,970.5012,14.518,False,VIAHTQMR,8

there is no other file in the output folder:

ls -la /scratch/tobiasko/msms/
total 11096
drwxrwxr-x+ 1 tobiasko SG_Employees       22 Sep 22 13:05 .
drwxrwxr-x+ 1 tobiasko SG_Employees       88 Oct  4 11:07 ..
-rw-rw-r--+ 1 tobiasko SG_Employees 11360807 Sep 22 13:05 msms.prosit
tobiasko commented 11 months ago

Another thing: I can find the run_oktoberfest.py script anymore in the latest version:

ls -la
total 40
drwxr-xr-x   9 tobiasko SG_Employees  4096 Oct  4 10:57 .
drwxr-xr-x 123 tobiasko SG_Employees  8192 Oct  4 10:57 ..
drwxr-xr-x   3 tobiasko SG_Employees    75 Oct  4 10:57 data
-rw-r--r--   1 tobiasko SG_Employees  1411 Oct  4 10:57 __init__.py
-rw-r--r--   1 tobiasko SG_Employees  1009 Oct  4 10:57 __main__.py
drwxr-xr-x   3 tobiasko SG_Employees    76 Oct  4 10:57 plotting
drwxr-xr-x   3 tobiasko SG_Employees    59 Oct  4 10:57 predict
drwxr-xr-x   3 tobiasko SG_Employees    81 Oct  4 10:57 preprocessing
drwxr-xr-x   2 tobiasko SG_Employees   110 Oct  4 10:57 __pycache__
drwxr-xr-x   3 tobiasko SG_Employees    75 Oct  4 10:57 rescore
-rw-r--r--   1 tobiasko SG_Employees 14279 Oct  4 10:57 runner.py
drwxr-xr-x   3 tobiasko SG_Employees   134 Oct  4 10:57 utils

If this is intended and will stay like this in the future, you might update this here and replace it with this

picciama commented 11 months ago

Concerning the key error you get: This is likely because you provide a folder that contains a raw file with a name that is not present in the msms.prosit file. Please check the following potential issues:

  1. the search results in msms.prosit do not contain the file LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02
  2. the filename is different in the msms.prosit so the mapping between spectra file and PSMs does not work

Meanwhile, I will implement a check that prints a warning if no PSMs for a provided filename could be found in the search results.

Concerning the second point: A lot of code was cleaned up for the 0.5.0 release so run_oktoberfest.py was integrated into runner.py. Well spotted, I will correct the documentation on github to reflect what is written on oktoberfest.readthedocs.io

tobiasko commented 11 months ago

Hmmm, the spectra folder contains many more raw files than covered by the .pepxml file:

ls -la /scratch/cpanse/PXD028735/dda/
total 218975360
drwxrwxr-x+ 1 tobiasko SG_Employees       7154 Jul 18 09:52 .
drwxrwxrwx+ 1 cpanse   SG_Employees       6756 Sep 22 11:28 ..
-rw-rw-r--+ 1 cpanse   SG_Employees       3337 Jul 17 10:29 checmsum.md5
-rw-rw-r--+ 1 tobiasko SG_Employees       3427 Jul 13 15:29 dda.fp-manifest
-rw-rw-r--+ 1 tobiasko SG_Employees      11261 Jul 13 15:31 Default_zero_Oktoberfest.workflow
drwxrwxr-x+ 1 tobiasko SG_Employees         26 Jul 14 09:13 FragPipeOutput
-rw-rw-r--+ 1 root     root         1276022114 Jul 14 14:13 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3459277876 May 10  2022 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.raw
-rw-rw-r--+ 1 root     root         1314764144 Jul 14 14:15 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3612100302 May 11  2022 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.raw
-rw-rw-r--+ 1 root     root         1361457428 Jul 14 14:14 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3701529661 May 10  2022 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.raw
-rw-rw-r--+ 1 root     root         1370533416 Jul 14 14:15 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_04.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3803946051 May 11  2022 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_04.raw
-rw-rw-r--+ 1 root     root         1350804551 Jul 14 14:16 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_01.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3726871843 May 10  2022 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_01.raw
-rw-rw-r--+ 1 root     root         1360132006 Jul 14 14:15 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_02.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3744014296 May 12  2022 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_02.raw
-rw-rw-r--+ 1 root     root         1324188169 Jul 14 14:14 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_03.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3557854702 May 12  2022 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_03.raw
-rw-rw-r--+ 1 root     root         1381335766 Jul 14 14:15 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_04.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3847093811 May 10  2022 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_04.raw
-rw-rw-r--+ 1 root     root         1335672816 Jul 14 14:14 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_01.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3663013199 May 10  2022 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_01.raw
-rw-rw-r--+ 1 root     root         1371316733 Jul 14 14:14 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_02.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3737657800 May 10  2022 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_02.raw
-rw-rw-r--+ 1 root     root         1396203946 Jul 14 14:15 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_03.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3925017162 May 10  2022 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_03.raw
-rw-rw-r--+ 1 root     root         1365540346 Jul 14 14:15 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_04.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3774274368 May 12  2022 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_04.raw
-rw-rw-r--+ 1 root     root         1290801791 Jul 14 14:14 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3501153502 May 12  2022 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.raw
-rw-rw-r--+ 1 root     root         1380771948 Jul 14 14:16 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3904127239 May 10  2022 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.raw
-rw-rw-r--+ 1 root     root         1360576319 Jul 14 14:14 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3693116392 May 12  2022 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.raw
-rw-rw-r--+ 1 root     root         1374002406 Jul 14 14:13 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_04.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3824873805 May 12  2022 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_04.raw
-rw-rw-r--+ 1 root     root         1349182218 Jul 14 14:31 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_01.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3712861001 May 12  2022 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_01.raw
-rw-rw-r--+ 1 root     root         1363783417 Jul 14 14:33 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_02.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3723972878 May 10  2022 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_02.raw
-rw-rw-r--+ 1 root     root         1365602005 Jul 14 14:33 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_03.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3807936631 May 10  2022 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_03.raw
-rw-rw-r--+ 1 root     root         1369697566 Jul 14 14:32 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_04.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3808610486 May 10  2022 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_04.raw
-rw-rw-r--+ 1 root     root         1336687477 Jul 14 14:32 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_01.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3679450988 May 11  2022 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_01.raw
-rw-rw-r--+ 1 root     root         1375750763 Jul 14 14:33 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_02.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3751359283 May 12  2022 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_02.raw
-rw-rw-r--+ 1 root     root         1387911522 Jul 14 14:32 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_03.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3888135685 May 12  2022 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_03.raw
-rw-rw-r--+ 1 root     root         1354050052 Jul 14 14:32 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_04.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3737447850 May 12  2022 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_04.raw
-rw-rw-r--+ 1 root     root         1000801557 Jul 14 14:27 LFQ_Orbitrap_DDA_Ecoli_01.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3228500047 May 11  2022 LFQ_Orbitrap_DDA_Ecoli_01.raw
-rw-rw-r--+ 1 root     root          993546580 Jul 14 14:27 LFQ_Orbitrap_DDA_Ecoli_02.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3222535657 May 12  2022 LFQ_Orbitrap_DDA_Ecoli_02.raw
-rw-rw-r--+ 1 root     root          989534228 Jul 14 14:27 LFQ_Orbitrap_DDA_Ecoli_03.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3210727843 May 12  2022 LFQ_Orbitrap_DDA_Ecoli_03.raw
-rw-rw-r--+ 1 root     root         1363285349 Jul 14 14:33 LFQ_Orbitrap_DDA_Human_01.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3704372427 May 10  2022 LFQ_Orbitrap_DDA_Human_01.raw
-rw-rw-r--+ 1 root     root         1334702959 Jul 14 14:33 LFQ_Orbitrap_DDA_Human_02.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3616768504 May 10  2022 LFQ_Orbitrap_DDA_Human_02.raw
-rw-rw-r--+ 1 root     root         1335722200 Jul 14 14:33 LFQ_Orbitrap_DDA_Human_03.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3658873260 May 10  2022 LFQ_Orbitrap_DDA_Human_03.raw
-rw-rw-r--+ 1 root     root         1277506970 Jul 14 14:33 LFQ_Orbitrap_DDA_QC_01.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3465044037 May 12  2022 LFQ_Orbitrap_DDA_QC_01.raw
-rw-rw-r--+ 1 root     root         1326638205 Jul 14 14:34 LFQ_Orbitrap_DDA_QC_02.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3646730576 May 12  2022 LFQ_Orbitrap_DDA_QC_02.raw
-rw-rw-r--+ 1 root     root         1350655649 Jul 14 14:45 LFQ_Orbitrap_DDA_QC_03.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3719296311 May 12  2022 LFQ_Orbitrap_DDA_QC_03.raw
-rw-rw-r--+ 1 root     root         1325318217 Jul 14 14:45 LFQ_Orbitrap_DDA_QC_04.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3638629986 May 12  2022 LFQ_Orbitrap_DDA_QC_04.raw
-rw-rw-r--+ 1 root     root         1373957153 Jul 14 14:47 LFQ_Orbitrap_DDA_QC_05.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3848899529 May 12  2022 LFQ_Orbitrap_DDA_QC_05.raw
-rw-rw-r--+ 1 root     root         1353831466 Jul 14 14:49 LFQ_Orbitrap_DDA_QC_06.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3679452851 May 12  2022 LFQ_Orbitrap_DDA_QC_06.raw
-rw-rw-r--+ 1 root     root         1364329047 Jul 14 14:49 LFQ_Orbitrap_DDA_QC_07.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3710507404 May 12  2022 LFQ_Orbitrap_DDA_QC_07.raw
-rw-rw-r--+ 1 root     root         1359819605 Jul 14 14:50 LFQ_Orbitrap_DDA_QC_08.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3694714723 May 10  2022 LFQ_Orbitrap_DDA_QC_08.raw
-rw-rw-r--+ 1 root     root         1392466478 Jul 14 14:50 LFQ_Orbitrap_DDA_QC_09.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3955392294 May 10  2022 LFQ_Orbitrap_DDA_QC_09.raw
-rw-rw-r--+ 1 root     root         1371714129 Jul 14 14:50 LFQ_Orbitrap_DDA_QC_10.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3826713756 May 12  2022 LFQ_Orbitrap_DDA_QC_10.raw
-rw-rw-r--+ 1 root     root         1373229425 Jul 14 14:51 LFQ_Orbitrap_DDA_QC_11.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3828401838 May 10  2022 LFQ_Orbitrap_DDA_QC_11.raw
-rw-rw-r--+ 1 root     root         1356128901 Jul 14 14:50 LFQ_Orbitrap_DDA_QC_12.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3767945668 May 10  2022 LFQ_Orbitrap_DDA_QC_12.raw
-rw-rw-r--+ 1 root     root         1217563461 Jul 14 14:46 LFQ_Orbitrap_DDA_Yeast_01.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3275043903 May 10  2022 LFQ_Orbitrap_DDA_Yeast_01.raw
-rw-rw-r--+ 1 root     root         1200474584 Jul 14 14:46 LFQ_Orbitrap_DDA_Yeast_02.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3259909482 May 12  2022 LFQ_Orbitrap_DDA_Yeast_02.raw
-rw-rw-r--+ 1 root     root         1204583659 Jul 14 14:46 LFQ_Orbitrap_DDA_Yeast_03.mzML
-r--r--r--+ 1 cpanse   SG_Employees 3303790909 May 10  2022 LFQ_Orbitrap_DDA_Yeast_03.raw
-rw-r--r--+ 1 cpanse   SG_Employees       1018 Jul 14 13:56 Makefile
-rwxrw-r--+ 1 tobiasko SG_Employees        228 Jul 14 09:13 runfragpipe.bash

There is actually one pepxml file for each raw file in FragPipe output folder:

ls -la /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/
total 17695832
drwxrwxr-x+ 1 tobiasko SG_Employees      19792 Jul 14 17:13 .
drwxrwxr-x+ 1 tobiasko SG_Employees         26 Jul 14 09:13 ..
-rw-rw-r--+ 1 tobiasko SG_Employees  242722540 Jul 14 17:02 combined.prot.xml
-rw-rw-r--+ 1 tobiasko SG_Employees       5632 Jul 14 14:58 filelist_proteinprophet.txt
-rw-rw-r--+ 1 tobiasko SG_Employees       9651 Jul 14 17:12 filter.log
-rw-rw-r--+ 1 tobiasko SG_Employees       9788 Jul 14 14:58 fragger.params
-rw-rw-r--+ 1 tobiasko SG_Employees       3426 Jul 14 17:13 fragpipe-files.fp-manifest
-rw-rw-r--+ 1 tobiasko SG_Employees      11715 Jul 14 17:13 fragpipe.workflow
-rw-rw-r--+ 1 tobiasko SG_Employees  120419761 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  123579205 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  130790185 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  129977302 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_04.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  126921319 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_01.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  126560590 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_02.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  126196830 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_03.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  129856529 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_04.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  125736895 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_01.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  130117333 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_02.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  129414446 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_03.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  128472556 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_04.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  124265328 Jul 14 16:56 interact-LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  123593038 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  132080825 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  131977057 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_04.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  129268296 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_01.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  130518466 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_02.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  126464790 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_03.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  128592545 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_04.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  126860869 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_01.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  131527795 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_02.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  130646835 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_03.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  128423451 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_04.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees   30271048 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_Ecoli_01.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees   27320573 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_Ecoli_02.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees   28257458 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_Ecoli_03.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  136202606 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_Human_01.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  131946385 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_Human_02.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  130463052 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_Human_03.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  121162792 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_QC_01.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  124888914 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_QC_02.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  128168350 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_QC_03.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  125542002 Jul 14 16:56 interact-LFQ_Orbitrap_DDA_QC_04.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  126178860 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_QC_05.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  128525437 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_QC_06.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  129784624 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_QC_07.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  130186307 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_QC_08.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  126886411 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_QC_09.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  129204495 Jul 14 16:56 interact-LFQ_Orbitrap_DDA_QC_10.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  128970846 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_QC_11.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees  127577728 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_QC_12.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees   74739122 Jul 14 16:57 interact-LFQ_Orbitrap_DDA_Yeast_01.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees   72800646 Jul 14 16:59 interact-LFQ_Orbitrap_DDA_Yeast_02.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees   72540553 Jul 14 16:58 interact-LFQ_Orbitrap_DDA_Yeast_03.pep.xml
-rw-rw-r--+ 1 tobiasko SG_Employees   25769444 Jul 14 17:12 ion.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   34243596 Jul 14 16:25 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  139837745 Jul 14 15:47 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   30812016 Jul 14 15:47 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   44430693 Jul 14 15:47 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   35114184 Jul 14 16:25 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  143103363 Jul 14 15:48 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   31574305 Jul 14 15:48 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   45385140 Jul 14 15:48 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   37562392 Jul 14 16:25 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  153244328 Jul 14 15:48 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   33796150 Jul 14 15:49 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48592155 Jul 14 15:48 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   36997852 Jul 14 16:26 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_04_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  151527906 Jul 14 15:49 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_04.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   33301849 Jul 14 15:49 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_04.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48226090 Jul 14 15:49 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_04.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   36601149 Jul 14 16:26 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_01_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  149645919 Jul 14 15:50 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_01.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   32909394 Jul 14 15:50 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_01.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   47540095 Jul 14 15:50 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_01.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   36769787 Jul 14 16:26 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_02_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  150111972 Jul 14 15:51 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_02.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   33073217 Jul 14 15:51 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_02.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   47499973 Jul 14 15:51 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_02.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   35967345 Jul 14 16:26 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_03_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  147370928 Jul 14 15:52 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_03.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   32363466 Jul 14 15:52 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_03.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   46780157 Jul 14 15:52 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_03.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   37228422 Jul 14 16:26 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_04_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  152838033 Jul 14 15:52 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_04.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   33496584 Jul 14 15:52 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_04.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48578867 Jul 14 15:52 LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_04.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   36028471 Jul 14 16:27 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_01_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  147046140 Jul 14 15:53 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_01.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   32413233 Jul 14 15:53 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_01.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   46616452 Jul 14 15:53 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_01.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   37791748 Jul 14 16:27 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_02_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  154175872 Jul 14 15:54 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_02.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   34009128 Jul 14 15:54 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_02.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48840393 Jul 14 15:54 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_02.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   37432545 Jul 14 16:27 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_03_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  153361536 Jul 14 15:55 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_03.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   33696784 Jul 14 15:55 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_03.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48742741 Jul 14 15:55 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_03.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   36920841 Jul 14 16:27 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_04_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  151485897 Jul 14 15:55 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_04.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   33242371 Jul 14 15:56 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_04.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48197550 Jul 14 15:56 LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_04.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   35318445 Jul 14 16:27 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  144915903 Jul 14 15:56 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   31818856 Jul 14 15:56 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   46157493 Jul 14 15:56 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   36178440 Jul 14 16:27 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  148415804 Jul 14 15:57 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   32606188 Jul 14 15:57 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   47159197 Jul 14 15:57 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   37758105 Jul 14 16:28 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  154835341 Jul 14 15:58 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   34008320 Jul 14 15:58 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   49247326 Jul 14 15:58 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   37267197 Jul 14 16:28 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_04_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  153586389 Jul 14 15:59 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_04.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   33580391 Jul 14 15:59 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_04.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   49070473 Jul 14 15:59 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_04.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   36768331 Jul 14 16:28 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_01_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  151038624 Jul 14 16:00 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_01.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   33094161 Jul 14 16:00 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_01.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48079943 Jul 14 16:00 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_01.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   37426462 Jul 14 16:28 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_02_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  153710400 Jul 14 16:00 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_02.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   33700107 Jul 14 16:00 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_02.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48835061 Jul 14 16:00 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_02.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   36535152 Jul 14 16:28 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_03_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  150820970 Jul 14 16:01 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_03.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   32924430 Jul 14 16:01 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_03.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48073001 Jul 14 16:01 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_03.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   36919102 Jul 14 16:29 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_04_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  152269809 Jul 14 16:01 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_04.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   33255364 Jul 14 16:01 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_04.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48497050 Jul 14 16:01 LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_04.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   36023504 Jul 14 16:29 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_01_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  147615582 Jul 14 16:02 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_01.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   32429648 Jul 14 16:02 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_01.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   46958191 Jul 14 16:02 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_01.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   38076620 Jul 14 16:29 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_02_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  155899367 Jul 14 16:03 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_02.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   34295494 Jul 14 16:03 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_02.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   49496597 Jul 14 16:03 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_02.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   37501902 Jul 14 16:29 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_03_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  154427209 Jul 14 16:03 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_03.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   33798848 Jul 14 16:03 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_03.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   49287792 Jul 14 16:03 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_03.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   36643148 Jul 14 16:29 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_04_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  150921871 Jul 14 16:04 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_04.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   33023871 Jul 14 16:04 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_04.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48133957 Jul 14 16:04 LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_04.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   12572235 Jul 14 16:29 LFQ_Orbitrap_DDA_Ecoli_01_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   49425811 Jul 14 16:04 LFQ_Orbitrap_DDA_Ecoli_01.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   11155807 Jul 14 16:04 LFQ_Orbitrap_DDA_Ecoli_01.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   14094687 Jul 14 16:04 LFQ_Orbitrap_DDA_Ecoli_01.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   11086553 Jul 14 16:29 LFQ_Orbitrap_DDA_Ecoli_02_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   43600238 Jul 14 16:05 LFQ_Orbitrap_DDA_Ecoli_02.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees    9838225 Jul 14 16:05 LFQ_Orbitrap_DDA_Ecoli_02.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   12434222 Jul 14 16:05 LFQ_Orbitrap_DDA_Ecoli_02.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   11963646 Jul 14 16:29 LFQ_Orbitrap_DDA_Ecoli_03_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   47058253 Jul 14 16:05 LFQ_Orbitrap_DDA_Ecoli_03.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   10622109 Jul 14 16:05 LFQ_Orbitrap_DDA_Ecoli_03.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   13431699 Jul 14 16:05 LFQ_Orbitrap_DDA_Ecoli_03.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   36370917 Jul 14 16:30 LFQ_Orbitrap_DDA_Human_01_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  159003949 Jul 14 16:06 LFQ_Orbitrap_DDA_Human_01.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   32664218 Jul 14 16:06 LFQ_Orbitrap_DDA_Human_01.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   52167446 Jul 14 16:06 LFQ_Orbitrap_DDA_Human_01.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   35152179 Jul 14 16:30 LFQ_Orbitrap_DDA_Human_02_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  153616033 Jul 14 16:06 LFQ_Orbitrap_DDA_Human_02.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   31576632 Jul 14 16:07 LFQ_Orbitrap_DDA_Human_02.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   50379733 Jul 14 16:07 LFQ_Orbitrap_DDA_Human_02.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   34986669 Jul 14 16:30 LFQ_Orbitrap_DDA_Human_03_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  152896234 Jul 14 16:07 LFQ_Orbitrap_DDA_Human_03.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   31423251 Jul 14 16:07 LFQ_Orbitrap_DDA_Human_03.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   50143378 Jul 14 16:07 LFQ_Orbitrap_DDA_Human_03.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   32331099 Jul 14 16:30 LFQ_Orbitrap_DDA_QC_01_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  139331524 Jul 14 16:08 LFQ_Orbitrap_DDA_QC_01.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   28877457 Jul 14 16:08 LFQ_Orbitrap_DDA_QC_01.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   45067276 Jul 14 16:08 LFQ_Orbitrap_DDA_QC_01.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   33649871 Jul 14 16:30 LFQ_Orbitrap_DDA_QC_02_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  144634974 Jul 14 16:08 LFQ_Orbitrap_DDA_QC_02.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   30048458 Jul 14 16:08 LFQ_Orbitrap_DDA_QC_02.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   46697222 Jul 14 16:08 LFQ_Orbitrap_DDA_QC_02.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   34378719 Jul 14 16:30 LFQ_Orbitrap_DDA_QC_03_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  147829013 Jul 14 16:09 LFQ_Orbitrap_DDA_QC_03.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   30698284 Jul 14 16:09 LFQ_Orbitrap_DDA_QC_03.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   47749170 Jul 14 16:09 LFQ_Orbitrap_DDA_QC_03.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   33382876 Jul 14 16:31 LFQ_Orbitrap_DDA_QC_04_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  143630293 Jul 14 16:09 LFQ_Orbitrap_DDA_QC_04.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   29802908 Jul 14 16:09 LFQ_Orbitrap_DDA_QC_04.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   46387486 Jul 14 16:09 LFQ_Orbitrap_DDA_QC_04.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   34161295 Jul 14 16:31 LFQ_Orbitrap_DDA_QC_05_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  146825523 Jul 14 16:10 LFQ_Orbitrap_DDA_QC_05.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   30511701 Jul 14 16:10 LFQ_Orbitrap_DDA_QC_05.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   47342807 Jul 14 16:10 LFQ_Orbitrap_DDA_QC_05.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   34878374 Jul 14 16:31 LFQ_Orbitrap_DDA_QC_06_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  150172472 Jul 14 16:11 LFQ_Orbitrap_DDA_QC_06.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   31163932 Jul 14 16:11 LFQ_Orbitrap_DDA_QC_06.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48512357 Jul 14 16:11 LFQ_Orbitrap_DDA_QC_06.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   35152489 Jul 14 16:31 LFQ_Orbitrap_DDA_QC_07_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  151155667 Jul 14 16:11 LFQ_Orbitrap_DDA_QC_07.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   31398395 Jul 14 16:11 LFQ_Orbitrap_DDA_QC_07.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48750033 Jul 14 16:11 LFQ_Orbitrap_DDA_QC_07.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   35057172 Jul 14 16:31 LFQ_Orbitrap_DDA_QC_08_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  150841948 Jul 14 16:12 LFQ_Orbitrap_DDA_QC_08.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   31313604 Jul 14 16:12 LFQ_Orbitrap_DDA_QC_08.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48673077 Jul 14 16:12 LFQ_Orbitrap_DDA_QC_08.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   34307597 Jul 14 16:31 LFQ_Orbitrap_DDA_QC_09_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  148249915 Jul 14 16:13 LFQ_Orbitrap_DDA_QC_09.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   30667400 Jul 14 16:13 LFQ_Orbitrap_DDA_QC_09.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   47988959 Jul 14 16:13 LFQ_Orbitrap_DDA_QC_09.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   34496062 Jul 14 16:32 LFQ_Orbitrap_DDA_QC_10_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  149108875 Jul 14 16:14 LFQ_Orbitrap_DDA_QC_10.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   30825335 Jul 14 16:14 LFQ_Orbitrap_DDA_QC_10.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48303712 Jul 14 16:14 LFQ_Orbitrap_DDA_QC_10.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   34561085 Jul 14 16:32 LFQ_Orbitrap_DDA_QC_11_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  149365768 Jul 14 16:15 LFQ_Orbitrap_DDA_QC_11.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   30882349 Jul 14 16:15 LFQ_Orbitrap_DDA_QC_11.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   48380501 Jul 14 16:15 LFQ_Orbitrap_DDA_QC_11.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   34068907 Jul 14 16:32 LFQ_Orbitrap_DDA_QC_12_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  147331123 Jul 14 16:16 LFQ_Orbitrap_DDA_QC_12.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   30445880 Jul 14 16:16 LFQ_Orbitrap_DDA_QC_12.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   47744298 Jul 14 16:16 LFQ_Orbitrap_DDA_QC_12.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   26763710 Jul 14 16:32 LFQ_Orbitrap_DDA_Yeast_01_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  106844595 Jul 14 16:17 LFQ_Orbitrap_DDA_Yeast_01.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   23699257 Jul 14 16:17 LFQ_Orbitrap_DDA_Yeast_01.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   31861797 Jul 14 16:17 LFQ_Orbitrap_DDA_Yeast_01.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   26045696 Jul 14 16:32 LFQ_Orbitrap_DDA_Yeast_02_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  104064965 Jul 14 16:17 LFQ_Orbitrap_DDA_Yeast_02.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   23064377 Jul 14 16:17 LFQ_Orbitrap_DDA_Yeast_02.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   31080830 Jul 14 16:17 LFQ_Orbitrap_DDA_Yeast_02.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   25783290 Jul 14 16:32 LFQ_Orbitrap_DDA_Yeast_03_edited.pin
-rw-rw-r--+ 1 tobiasko SG_Employees  102914665 Jul 14 16:18 LFQ_Orbitrap_DDA_Yeast_03.pepXML
-rw-rw-r--+ 1 tobiasko SG_Employees   22825700 Jul 14 16:18 LFQ_Orbitrap_DDA_Yeast_03.pin
-rw-rw-r--+ 1 tobiasko SG_Employees   30705440 Jul 14 16:18 LFQ_Orbitrap_DDA_Yeast_03.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees     156767 Jul 14 09:14 log_2023-07-14_09-14-13.txt
-rw-rw-r--+ 1 tobiasko SG_Employees       2180 Jul 14 14:58 msbooster_params.txt
drwxrwxr-x+ 1 tobiasko SG_Employees       3434 Jul 14 16:32 MSBooster_RTplots
-rw-rw-r--+ 1 tobiasko SG_Employees   16866052 Jul 14 17:12 peptide.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees    6792707 Jul 14 17:12 protein.fas
-rw-rw-r--+ 1 tobiasko SG_Employees    2618965 Jul 14 17:12 protein.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees 1217193542 Jul 14 17:13 psm.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   15056686 Jul 14 16:20 spectraRT_full.tsv
-rw-rw-r--+ 1 tobiasko SG_Employees   98019792 Jul 14 16:25 spectraRT.predicted.bin
-rw-rw-r--+ 1 tobiasko SG_Employees   15533157 Jul 14 16:20 spectraRT.tsv

but I provided only one in the config file. Why exactly is this a problem? Shouldn't the search result determine which raw files needs to be found?

tobiasko commented 11 months ago

not found, as expected due to .pepxml file coverage:

grep LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02 /scratch/tobiasko/msms/msms.prosit
tobiasko commented 11 months ago

I changed the source to a single file:

head /home/tobiasko/CEcalibration_config.json
{
    "type": "CollisionEnergyCalibration",
    "tag": "",
    "allFeatures": false,
    "inputs": {
        "search_results_type": "Msfragger",
        "spectra": "/scratch/cpanse/PXD028735/dda/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML",
        "spectra_type": "mzml",
        "search_results": "/scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.pepXML"
    },

and get a different error:

python3 -m oktoberfest --config_path ~/CEcalibration_config.json
2023-10-04 15:13:48,752 - INFO - oktoberfest::main Oktoberfest version 0.5.0
Copyright 2023, Wilhelmlab at Technical University of Munich
2023-10-04 15:13:48,753 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-10-04 15:13:48,754 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-10-04 15:13:48,755 - INFO - oktoberfest.runner::run_ce_calibration Found 1 files in the spectra directory.
2023-10-04 15:13:48,755 - INFO - oktoberfest.runner::_preprocess Converting search results from /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.pepXML to internal search result.
2023-10-04 15:13:48,755 - INFO - spectrum_io.search_result.search_results::generate_internal Found search results in internal format at /scratch/tobiasko/msms/msms.prosit, skipping conversion
2023-10-04 15:13:48,876 - INFO - oktoberfest.runner::_preprocess Read 98622 PSMs from /scratch/tobiasko/msms/msms.prosit
2023-10-04 15:13:48,984 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/tobiasko/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.rescore
Waiting for tasks to complete:   0%|                                                                                                 | 0/1 [00:00<?, ?it/s]2023-10-04 15:13:49,442 - INFO - spectrum_io.raw.msraw::read_mzml Reading mzML file: /scratch/cpanse/PXD028735/dda/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML
Waiting for tasks to complete:   0%|                                                                                                 | 0/1 [00:02<?, ?it/s]
2023-10-04 15:13:51,932 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool Caught Unknown exception, terminating workers
2023-10-04 15:13:51,932 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool Caught Unknown exception, terminating workers
2023-10-04 15:13:51,933 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 199, in _ce_calib
    library = _annotate_and_get_library(spectra_file, config)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 67, in _annotate_and_get_library
    spectra = pp.load_spectra(mzml_file)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/preprocessing/preprocessing.py", line 372, in load_spectra
    return ThermoRaw.read_mzml(
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 154, in read_mzml
    instrument_configuration_ref = spec["scanList"]["scan"][0]["instrumentConfigurationRef"]
KeyError: 'instrumentConfigurationRef'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/utils/multiprocessing_pool.py", line 43, in check_pool
    outputs.append(res.get(timeout=10000))  # 10000 seconds = ~3 hours
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
KeyError: 'instrumentConfigurationRef'

2023-10-04 15:13:51,933 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 199, in _ce_calib
    library = _annotate_and_get_library(spectra_file, config)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 67, in _annotate_and_get_library
    spectra = pp.load_spectra(mzml_file)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/preprocessing/preprocessing.py", line 372, in load_spectra
    return ThermoRaw.read_mzml(
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 154, in read_mzml
    instrument_configuration_ref = spec["scanList"]["scan"][0]["instrumentConfigurationRef"]
KeyError: 'instrumentConfigurationRef'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/utils/multiprocessing_pool.py", line 43, in check_pool
    outputs.append(res.get(timeout=10000))  # 10000 seconds = ~3 hours
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
KeyError: 'instrumentConfigurationRef'

2023-10-04 15:13:51,934 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool 'instrumentConfigurationRef'
2023-10-04 15:13:51,934 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool 'instrumentConfigurationRef'
picciama commented 11 months ago

but I provided only one in the config file. Why exactly is this a problem? Shouldn't the search result determine which raw files needs to be found?

Yes indeed, it shouldn't raise a key error. This is an inconvenience that I will address by printing a warning instead of raising an error. If you want to include all files in the msms.prosit, you can also provide the folder instead of one pepXML file, and Oktoberferst wil include all pepXML files contained in the folder and all subfolders then.

For now, you have solved this by explictely providing one raw file.

and get a different error:

The mzML file should contain a header that defines a list of instrument types for each MS level. Each spectra is then using a reference "instrumentConfigurationRef" that defines which instrument was used to measure it. It seems your mzML file does not have this. It may be that this is not a standard. We convert all our raw files with ThermoRawFileParser and then it works but you said you are using MSConvert. It seems the output isn't consistent between the tools which is of course a shame :( If possible, please check the mzML file and help me finding out about the differences between the tool, then I would be able to add support for MSConvert mzML format in the future.

I will add a check if the instrument reference is provided and if not, rely on the user providing mass tolerance and unit.

picciama commented 11 months ago

Can you maybe send me an email with one of the pepXML + corresponding mzML file? Would be helpful to debug this for this particular case.

tobiasko commented 11 months ago

The data is available from our ftp server using this url.

tobiasko commented 11 months ago

I would honestly say that if anything in the mzML space def. a standard or reference implementation, it is msconvert/the proteowizard project - not the rawfileparser. There is meanwhile a containerized version that also works on Linux, see here. github This is what we are using on our HPC nodes to convert raw files. Would that also be a solution for oktoberfest?

tobiasko commented 11 months ago

mzML makes me 🤢🤬💀🤯 !

tobiasko commented 11 months ago

I am sorry, but I guess I can't really help with the mzML part and how different mzML "dialects" might differ. Also not sure how to validate mzML files using tools. I guess Matt from pwiz would be right person to talk to.

tobiasko commented 11 months ago

But do I get this right: In the end all you want is to extract ~1000 MS2 scans from a raw file to do the spectral angle/similarity calculation and because you can't request the peak lists of these scans selectively using from sort of API you need to convert the the complete file to mzML. Welcome to the future! ;-)

picciama commented 11 months ago

Ok, I think I found a fix, and I used that one mzML and performed CECalibration and Rescoring with it. The rescoring results on peptide level and spectral angle for the tested CEs are below, so it definitely works now. I have pushed the fix to the fix/mzml_instrumentConfigurationRef branch of spectrum-io, you can install it using pip install git+https://github.com/wilhelm-lab/spectrum_io.git#fix/mzml_instrumentConfigurationRef for now, until I release this.

image image image

picciama commented 11 months ago

But do I get this right: In the end all you want is to extract ~1000 MS2 scans from a raw file to do the spectral angle/similarity calculation and because you can't request the peak lists of these scans selectively using from sort of API you need to convert the the complete file to mzML. Welcome to the future! ;-)

Yes, for CE calibration, we currently take the top 1000 scoring target PSMs, so unfortunately, we need to read all of them and then sort by score. If you know a better way of doing this, please let me know :)

tobiasko commented 11 months ago

My comment was more about the conversion of 99% of the scan data (the raw file) that is afterwards anyway not needed. If one could selectively request those ~1000 from the binary file without a linear read access...it would save so much time and computation...

tobiasko commented 11 months ago

Nice plots! ok. will try to update spectrum-io!

picciama commented 11 months ago

My comment was more about the conversion of 99% of the scan data (the raw file) that is afterwards anyway not needed. If one could selectively request those ~1000 from the binary file without a linear read access...it would save so much time and computation...

Yes, good idea. Especially in situtation of many raw files. This does apparently work with ThermoRawFileParser by providing the scannumbers you want to extract but it would require many changes. I created an issue for that: https://github.com/wilhelm-lab/oktoberfest/issues/135

tobiasko commented 11 months ago

I updated by

 pip install git+https://github.com/wilhelm-lab/spectrum_io.git#fix/mzml_instrumentConfigurationRef
Collecting git+https://github.com/wilhelm-lab/spectrum_io.git#fix/mzml_instrumentConfigurationRef
  Cloning https://github.com/wilhelm-lab/spectrum_io.git to /tmp/pip-req-build-s3u27zjj
  Running command git clone -q https://github.com/wilhelm-lab/spectrum_io.git /tmp/pip-req-build-s3u27zjj
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Requirement already satisfied: click>=8.0.0 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (8.1.7)
Requirement already satisfied: h5py<4.0.0,>=3.1.0 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (3.9.0)
Requirement already satisfied: tables<4.0.0,>=3.6.1 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (3.8.0)
Requirement already satisfied: pymzml<3.0.0,>=2.5.0 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (2.5.2)
Requirement already satisfied: PyYAML>=5.4.1 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (6.0.1)
Requirement already satisfied: pandas<2.0.0,>=1.3.0 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (1.5.3)
Requirement already satisfied: spectrum-fundamentals<0.5.0,>=0.4.3 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (0.4.3)
Requirement already satisfied: lxml<5.0.0,>=4.5.2 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (4.9.3)
Requirement already satisfied: pyteomics<5.0.0,>=4.3.3 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (4.6.2)
Requirement already satisfied: rich>=10.3.0 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (13.5.3)
Requirement already satisfied: numpy<2.0.0,>=1.18.1 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (1.24.4)
Requirement already satisfied: python-dateutil>=2.8.1 in ./oktoberfest-env/lib/python3.9/site-packages (from pandas<2.0.0,>=1.3.0->spectrum_io==0.3.3) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in ./oktoberfest-env/lib/python3.9/site-packages (from pandas<2.0.0,>=1.3.0->spectrum_io==0.3.3) (2023.3.post1)
Requirement already satisfied: regex in ./oktoberfest-env/lib/python3.9/site-packages (from pymzml<3.0.0,>=2.5.0->spectrum_io==0.3.3) (2023.8.8)
Requirement already satisfied: six>=1.5 in ./oktoberfest-env/lib/python3.9/site-packages (from python-dateutil>=2.8.1->pandas<2.0.0,>=1.3.0->spectrum_io==0.3.3) (1.16.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./oktoberfest-env/lib/python3.9/site-packages (from rich>=10.3.0->spectrum_io==0.3.3) (2.16.1)
Requirement already satisfied: markdown-it-py>=2.2.0 in ./oktoberfest-env/lib/python3.9/site-packages (from rich>=10.3.0->spectrum_io==0.3.3) (3.0.0)
Requirement already satisfied: mdurl~=0.1 in ./oktoberfest-env/lib/python3.9/site-packages (from markdown-it-py>=2.2.0->rich>=10.3.0->spectrum_io==0.3.3) (0.1.2)
Requirement already satisfied: scikit-learn<2.0,>=1.0 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (1.3.1)
Requirement already satisfied: moepy<2.0.0,>=1.1.4 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (1.1.4)
Requirement already satisfied: joblib<2.0.0,>=1.0.1 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (1.3.2)
Requirement already satisfied: matplotlib>=3.3.3 in ./oktoberfest-env/lib/python3.9/site-packages (from moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (3.8.0)
Requirement already satisfied: scipy>=1.6.0 in ./oktoberfest-env/lib/python3.9/site-packages (from moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (1.11.2)
Requirement already satisfied: tqdm>=4.59.0 in ./oktoberfest-env/lib/python3.9/site-packages (from moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (4.66.1)
Requirement already satisfied: cycler>=0.10 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (0.11.0)
Requirement already satisfied: pyparsing>=2.3.1 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (3.1.1)
Requirement already satisfied: fonttools>=4.22.0 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (4.42.1)
Requirement already satisfied: importlib-resources>=3.2.0 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (6.1.0)
Requirement already satisfied: kiwisolver>=1.0.1 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (1.4.5)
Requirement already satisfied: contourpy>=1.0.1 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (1.1.1)
Requirement already satisfied: pillow>=6.2.0 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (10.0.1)
Requirement already satisfied: packaging>=20.0 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (23.1)
Requirement already satisfied: zipp>=3.1.0 in ./oktoberfest-env/lib/python3.9/site-packages (from importlib-resources>=3.2.0->matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (3.17.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in ./oktoberfest-env/lib/python3.9/site-packages (from scikit-learn<2.0,>=1.0->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (3.2.0)
Requirement already satisfied: py-cpuinfo in ./oktoberfest-env/lib/python3.9/site-packages (from tables<4.0.0,>=3.6.1->spectrum_io==0.3.3) (9.0.0)
Requirement already satisfied: blosc2~=2.0.0 in ./oktoberfest-env/lib/python3.9/site-packages (from tables<4.0.0,>=3.6.1->spectrum_io==0.3.3) (2.0.0)
Requirement already satisfied: numexpr>=2.6.2 in ./oktoberfest-env/lib/python3.9/site-packages (from tables<4.0.0,>=3.6.1->spectrum_io==0.3.3) (2.8.6)
Requirement already satisfied: cython>=0.29.21 in ./oktoberfest-env/lib/python3.9/site-packages (from tables<4.0.0,>=3.6.1->spectrum_io==0.3.3) (3.0.2)
Requirement already satisfied: msgpack in ./oktoberfest-env/lib/python3.9/site-packages (from blosc2~=2.0.0->tables<4.0.0,>=3.6.1->spectrum_io==0.3.3) (1.0.6)

but the erro does not disappear:

python3 -m oktoberfest --config_path ~/CEcalibration_config.json
2023-10-05 15:22:45,816 - INFO - oktoberfest::main Oktoberfest version 0.5.0
Copyright 2023, Wilhelmlab at Technical University of Munich
2023-10-05 15:22:45,816 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-10-05 15:22:45,817 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-10-05 15:22:45,817 - INFO - oktoberfest.runner::run_ce_calibration Found 1 files in the spectra directory.
2023-10-05 15:22:45,817 - INFO - oktoberfest.utils.process_step::is_done Skipping preprocessing_search step because /scratch/tobiasko/proc/preprocessing_search.done was found.
Waiting for tasks to complete:   0%|                                                                                                 | 0/1 [00:00<?, ?it/s]2023-10-05 15:22:45,833 - INFO - spectrum_io.raw.msraw::read_mzml Reading mzML file: /scratch/cpanse/PXD028735/dda/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML
Waiting for tasks to complete:   0%|                                                                                                 | 0/1 [00:03<?, ?it/s]
2023-10-05 15:22:48,872 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool Caught Unknown exception, terminating workers
2023-10-05 15:22:48,872 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool Caught Unknown exception, terminating workers
2023-10-05 15:22:48,874 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 199, in _ce_calib
    library = _annotate_and_get_library(spectra_file, config)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 67, in _annotate_and_get_library
    spectra = pp.load_spectra(mzml_file)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/preprocessing/preprocessing.py", line 372, in load_spectra
    return ThermoRaw.read_mzml(
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 154, in read_mzml
    instrument_configuration_ref = spec["scanList"]["scan"][0]["instrumentConfigurationRef"]
KeyError: 'instrumentConfigurationRef'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/utils/multiprocessing_pool.py", line 43, in check_pool
    outputs.append(res.get(timeout=10000))  # 10000 seconds = ~3 hours
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
KeyError: 'instrumentConfigurationRef'

2023-10-05 15:22:48,874 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 199, in _ce_calib
    library = _annotate_and_get_library(spectra_file, config)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 67, in _annotate_and_get_library
    spectra = pp.load_spectra(mzml_file)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/preprocessing/preprocessing.py", line 372, in load_spectra
    return ThermoRaw.read_mzml(
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 154, in read_mzml
    instrument_configuration_ref = spec["scanList"]["scan"][0]["instrumentConfigurationRef"]
KeyError: 'instrumentConfigurationRef'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/utils/multiprocessing_pool.py", line 43, in check_pool
    outputs.append(res.get(timeout=10000))  # 10000 seconds = ~3 hours
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
KeyError: 'instrumentConfigurationRef'

2023-10-05 15:22:48,874 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool 'instrumentConfigurationRef'
2023-10-05 15:22:48,874 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool 'instrumentConfigurationRef'

What am I doing wrong?

picciama commented 11 months ago

Try again but instead of "#" write "@", i.e. pip install git+https://github.com/wilhelm-lab/spectrum_io.git@fix/mzml_instrumentConfigurationRef

Check while installing, that the log output specifically states it is checking out that branch.

Sometimes one also needs to first uninstall the package for whatever reason...

tobiasko commented 11 months ago

no difference.

pip install git+https://github.com/wilhelm-lab/spectrum_io.git@fix/mzml_instrumentConfigurationRef
Collecting git+https://github.com/wilhelm-lab/spectrum_io.git@fix/mzml_instrumentConfigurationRef
  Cloning https://github.com/wilhelm-lab/spectrum_io.git (to revision fix/mzml_instrumentConfigurationRef) to /tmp/pip-req-build-8q68fh98
  Running command git clone -q https://github.com/wilhelm-lab/spectrum_io.git /tmp/pip-req-build-8q68fh98
  Running command git checkout -b fix/mzml_instrumentConfigurationRef --track origin/fix/mzml_instrumentConfigurationRef
  Switched to a new branch 'fix/mzml_instrumentConfigurationRef'
  Branch 'fix/mzml_instrumentConfigurationRef' set up to track remote branch 'fix/mzml_instrumentConfigurationRef' from 'origin'.
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Requirement already satisfied: spectrum-fundamentals<0.5.0,>=0.4.3 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (0.4.3)
Requirement already satisfied: click>=8.0.0 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (8.1.7)
Requirement already satisfied: lxml<5.0.0,>=4.5.2 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (4.9.3)
Requirement already satisfied: rich>=10.3.0 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (13.5.3)
Requirement already satisfied: numpy<2.0.0,>=1.18.1 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (1.24.4)
Requirement already satisfied: PyYAML>=5.4.1 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (6.0.1)
Requirement already satisfied: pyteomics<5.0.0,>=4.3.3 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (4.6.2)
Requirement already satisfied: tables<4.0.0,>=3.6.1 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (3.8.0)
Requirement already satisfied: pandas<2.0.0,>=1.3.0 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (1.5.3)
Requirement already satisfied: pymzml<3.0.0,>=2.5.0 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (2.5.2)
Requirement already satisfied: h5py<4.0.0,>=3.1.0 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum_io==0.3.3) (3.9.0)
Requirement already satisfied: python-dateutil>=2.8.1 in ./oktoberfest-env/lib/python3.9/site-packages (from pandas<2.0.0,>=1.3.0->spectrum_io==0.3.3) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in ./oktoberfest-env/lib/python3.9/site-packages (from pandas<2.0.0,>=1.3.0->spectrum_io==0.3.3) (2023.3.post1)
Requirement already satisfied: regex in ./oktoberfest-env/lib/python3.9/site-packages (from pymzml<3.0.0,>=2.5.0->spectrum_io==0.3.3) (2023.8.8)
Requirement already satisfied: six>=1.5 in ./oktoberfest-env/lib/python3.9/site-packages (from python-dateutil>=2.8.1->pandas<2.0.0,>=1.3.0->spectrum_io==0.3.3) (1.16.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./oktoberfest-env/lib/python3.9/site-packages (from rich>=10.3.0->spectrum_io==0.3.3) (2.16.1)
Requirement already satisfied: markdown-it-py>=2.2.0 in ./oktoberfest-env/lib/python3.9/site-packages (from rich>=10.3.0->spectrum_io==0.3.3) (3.0.0)
Requirement already satisfied: mdurl~=0.1 in ./oktoberfest-env/lib/python3.9/site-packages (from markdown-it-py>=2.2.0->rich>=10.3.0->spectrum_io==0.3.3) (0.1.2)
Requirement already satisfied: joblib<2.0.0,>=1.0.1 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (1.3.2)
Requirement already satisfied: scikit-learn<2.0,>=1.0 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (1.3.1)
Requirement already satisfied: moepy<2.0.0,>=1.1.4 in ./oktoberfest-env/lib/python3.9/site-packages (from spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (1.1.4)
Requirement already satisfied: tqdm>=4.59.0 in ./oktoberfest-env/lib/python3.9/site-packages (from moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (4.66.1)
Requirement already satisfied: matplotlib>=3.3.3 in ./oktoberfest-env/lib/python3.9/site-packages (from moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (3.8.0)
Requirement already satisfied: scipy>=1.6.0 in ./oktoberfest-env/lib/python3.9/site-packages (from moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (1.11.2)
Requirement already satisfied: cycler>=0.10 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (0.11.0)
Requirement already satisfied: packaging>=20.0 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (23.1)
Requirement already satisfied: contourpy>=1.0.1 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (1.1.1)
Requirement already satisfied: importlib-resources>=3.2.0 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (6.1.0)
Requirement already satisfied: fonttools>=4.22.0 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (4.42.1)
Requirement already satisfied: pyparsing>=2.3.1 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (3.1.1)
Requirement already satisfied: kiwisolver>=1.0.1 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (1.4.5)
Requirement already satisfied: pillow>=6.2.0 in ./oktoberfest-env/lib/python3.9/site-packages (from matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (10.0.1)
Requirement already satisfied: zipp>=3.1.0 in ./oktoberfest-env/lib/python3.9/site-packages (from importlib-resources>=3.2.0->matplotlib>=3.3.3->moepy<2.0.0,>=1.1.4->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (3.17.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in ./oktoberfest-env/lib/python3.9/site-packages (from scikit-learn<2.0,>=1.0->spectrum-fundamentals<0.5.0,>=0.4.3->spectrum_io==0.3.3) (3.2.0)
Requirement already satisfied: py-cpuinfo in ./oktoberfest-env/lib/python3.9/site-packages (from tables<4.0.0,>=3.6.1->spectrum_io==0.3.3) (9.0.0)
Requirement already satisfied: cython>=0.29.21 in ./oktoberfest-env/lib/python3.9/site-packages (from tables<4.0.0,>=3.6.1->spectrum_io==0.3.3) (3.0.2)
Requirement already satisfied: numexpr>=2.6.2 in ./oktoberfest-env/lib/python3.9/site-packages (from tables<4.0.0,>=3.6.1->spectrum_io==0.3.3) (2.8.6)
Requirement already satisfied: blosc2~=2.0.0 in ./oktoberfest-env/lib/python3.9/site-packages (from tables<4.0.0,>=3.6.1->spectrum_io==0.3.3) (2.0.0)
Requirement already satisfied: msgpack in ./oktoberfest-env/lib/python3.9/site-packages (from blosc2~=2.0.0->tables<4.0.0,>=3.6.1->spectrum_io==0.3.3) (1.0.6)
python3 -m oktoberfest --config_path ~/CEcalibration_config.json
2023-10-05 16:15:39,182 - INFO - oktoberfest::main Oktoberfest version 0.5.0
Copyright 2023, Wilhelmlab at Technical University of Munich
2023-10-05 16:15:39,183 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-10-05 16:15:39,184 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-10-05 16:15:39,184 - INFO - oktoberfest.runner::run_ce_calibration Found 1 files in the spectra directory.
2023-10-05 16:15:39,185 - INFO - oktoberfest.utils.process_step::is_done Skipping preprocessing_search step because /scratch/tobiasko/proc/preprocessing_search.done was found.
Waiting for tasks to complete:   0%|                                                                                                 | 0/1 [00:00<?, ?it/s]2023-10-05 16:15:39,202 - INFO - spectrum_io.raw.msraw::read_mzml Reading mzML file: /scratch/cpanse/PXD028735/dda/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML
Waiting for tasks to complete:   0%|                                                                                                 | 0/1 [00:03<?, ?it/s]
2023-10-05 16:15:42,388 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool Caught Unknown exception, terminating workers
2023-10-05 16:15:42,388 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool Caught Unknown exception, terminating workers
2023-10-05 16:15:42,389 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 199, in _ce_calib
    library = _annotate_and_get_library(spectra_file, config)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 67, in _annotate_and_get_library
    spectra = pp.load_spectra(mzml_file)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/preprocessing/preprocessing.py", line 372, in load_spectra
    return ThermoRaw.read_mzml(
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 154, in read_mzml
    instrument_configuration_ref = spec["scanList"]["scan"][0]["instrumentConfigurationRef"]
KeyError: 'instrumentConfigurationRef'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/utils/multiprocessing_pool.py", line 43, in check_pool
    outputs.append(res.get(timeout=10000))  # 10000 seconds = ~3 hours
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
KeyError: 'instrumentConfigurationRef'

2023-10-05 16:15:42,389 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 199, in _ce_calib
    library = _annotate_and_get_library(spectra_file, config)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 67, in _annotate_and_get_library
    spectra = pp.load_spectra(mzml_file)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/preprocessing/preprocessing.py", line 372, in load_spectra
    return ThermoRaw.read_mzml(
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 154, in read_mzml
    instrument_configuration_ref = spec["scanList"]["scan"][0]["instrumentConfigurationRef"]
KeyError: 'instrumentConfigurationRef'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/utils/multiprocessing_pool.py", line 43, in check_pool
    outputs.append(res.get(timeout=10000))  # 10000 seconds = ~3 hours
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
KeyError: 'instrumentConfigurationRef'

2023-10-05 16:15:42,389 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool 'instrumentConfigurationRef'
2023-10-05 16:15:42,389 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool 'instrumentConfigurationRef'
tobiasko commented 11 months ago

not sure why... I check your commit 0ed85b8 in spectrum_io and on my local system it is still the old code (looked at line 154 in msraw.py)

picciama commented 11 months ago

Gotta love pip... Ok, try pip uninstall spectrum-io, then repeat the git install from this branch. I had trouble with this before and I don't know why that is but it is an issue with pip, because the change is definitely there in this branch.

tobiasko commented 11 months ago

BAAM! Looks like it works.

tobiasko commented 11 months ago

And the answer to life, the universe, and everything is

cat /scratch/tobiasko/results/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01_ce.txt
31

and not 42! 😂

tobiasko commented 11 months ago

Very cool. Thanks a lot for your fast help. Now I can do this for all the files. Does multithreading help for this workflow?

picciama commented 11 months ago

Yes it does, parallel processing is realised on the file level, The shared msms.prosit is split by rawfile and the entire annotation and prediction is performed in parallel then. I.e. use as many processes as you have files with the numThreads option in the config.

tobiasko commented 11 months ago

Very cool, 46 files done!

nl /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/results/*.txt
     1  31
     2  31
     3  30
     4  31
     5  31
     6  31
     7  31
     8  30
     9  31
    10  31
    11  31
    12  30
    13  31
    14  31
    15  31
    16  31
    17  31
    18  31
    19  31
    20  31
    21  31
    22  31
    23  31
    24  31
    25  31
    26  31
    27  31
    28  30
    29  31
    30  31
    31  31
    32  31
    33  31
    34  31
    35  31
    36  31
    37  31
    38  30
    39  31
    40  31
    41  31
    42  31
    43  30
    44  30
    45  30
tobiasko commented 11 months ago

Sorry... the problems continue! The output below is from a CEcalibration of FragPipe results. This time the mzML was written by FragPipe (not MSconvert) and the raw data is ddaPASEF style (so .tdf or .d):

python3 -m oktoberfest --config_path ~/CEcalibration_config.json
2023-10-09 10:49:07,230 - INFO - oktoberfest::main Oktoberfest version 0.5.0
Copyright 2023, Wilhelmlab at Technical University of Munich
2023-10-09 10:49:07,230 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-10-09 10:49:07,231 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-10-09 10:49:07,232 - INFO - oktoberfest.runner::run_ce_calibration Found 36 files in the spectra directory.
2023-10-09 10:49:07,232 - INFO - oktoberfest.runner::_preprocess Converting search results from /scratch/cpanse/PXD028735/ddaPASEF/FragPipeOutput/20230714_0922 to internal search result.
 78%|███████████████████████████████████████████████████████████████████████████████████████████▊                          | 28/36 [37:15<09:15, 69.49s/it]100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 36/36 [55:31<00:00, 92.55s/it]
2023-10-09 11:46:33,629 - INFO - spectrum_io.search_result.search_results::filter_valid_prosit_sequences #sequences before filtering for valid prosit sequences: 8785672
2023-10-09 11:47:03,017 - INFO - spectrum_io.search_result.search_results::filter_valid_prosit_sequences #sequences after filtering for valid prosit sequences: 8229990
2023-10-09 11:49:47,459 - INFO - oktoberfest.runner::_preprocess Read 8229990 PSMs from /scratch/cpanse/PXD028735/ddaPASEF/FragPipeOutput/20230714_0922/msms/msms.prosit
2023-10-09 11:50:05,883 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/ddaPASEF/FragPipeOutput/20230714_0922/msms/LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_01_uncalibrated.rescore
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /usr/lib/python3.9/runpy.py:197 in _run_module_as_main                                           │
│                                                                                                  │
│   194 │   main_globals = sys.modules["__main__"].__dict__                                        │
│   195 │   if alter_argv:                                                                         │
│   196 │   │   sys.argv[0] = mod_spec.origin                                                      │
│ ❱ 197 │   return _run_code(code, main_globals, None,                                             │
│   198 │   │   │   │   │    "__main__", mod_spec)                                                 │
│   199                                                                                            │
│   200 def run_module(mod_name, init_globals=None,                                                │
│                                                                                                  │
│ /usr/lib/python3.9/runpy.py:87 in _run_code                                                      │
│                                                                                                  │
│    84 │   │   │   │   │      __loader__ = loader,                                                │
│    85 │   │   │   │   │      __package__ = pkg_name,                                             │
│    86 │   │   │   │   │      __spec__ = mod_spec)                                                │
│ ❱  87 │   exec(code, run_globals)                                                                │
│    88 │   return run_globals                                                                     │
│    89                                                                                            │
│    90 def _run_module_code(code, init_globals=None,                                              │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/__main__.py:39 in         │
│ <module>                                                                                         │
│                                                                                                  │
│   36                                                                                             │
│   37 if __name__ == "__main__":                                                                  │
│   38 │   traceback.install()                                                                     │
│ ❱ 39 │   main()  # pragma: no cover                                                              │
│   40                                                                                             │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/__main__.py:34 in main    │
│                                                                                                  │
│   31 │   logger.info(f"Oktoberfest version {__version__}\n{__copyright__}")                      │
│   32 │                                                                                           │
│   33 │   args = _parse_args()                                                                    │
│ ❱ 34 │   runner.run_job(args.config_path)                                                        │
│   35                                                                                             │
│   36                                                                                             │
│   37 if __name__ == "__main__":                                                                  │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py:366 in run_job  │
│                                                                                                  │
│   363 │   if job_type == "SpectralLibraryGeneration":                                            │
│   364 │   │   generate_spectral_lib(config_path)                                                 │
│   365 │   elif job_type == "CollisionEnergyCalibration":                                         │
│ ❱ 366 │   │   run_ce_calibration(config_path)                                                    │
│   367 │   elif job_type == "Rescoring":                                                          │
│   368 │   │   run_rescoring(config_path)                                                         │
│   369 │   else:                                                                                  │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py:229 in          │
│ run_ce_calibration                                                                               │
│                                                                                                  │
│   226 │   proc_dir = config.output / "proc"                                                      │
│   227 │   proc_dir.mkdir(parents=True, exist_ok=True)                                            │
│   228 │                                                                                          │
│ ❱ 229 │   _preprocess(spectra_files, config)                                                     │
│   230 │                                                                                          │
│   231 │   processing_pool = JobPool(processes=config.num_threads)                                │
│   232                                                                                            │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py:45 in           │
│ _preprocess                                                                                      │
│                                                                                                  │
│    42 │   │   search_results = pp.filter_peptides_for_model(peptides=search_results, model=con   │
│    43 │   │                                                                                      │
│    44 │   │   # split search results                                                             │
│ ❱  45 │   │   pp.split_search(                                                                   │
│    46 │   │   │   search_results=search_results,                                                 │
│    47 │   │   │   output_dir=config.output / "msms",                                             │
│    48 │   │   │   filenames=[spectra_file.stem for spectra_file in spectra_files],               │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/preprocessing/preprocessi │
│ ng.py:314 in split_search                                                                        │
│                                                                                                  │
│   311 │   for filename in filenames:                                                             │
│   312 │   │   output_file = (output_dir / filename).with_suffix(".rescore")                      │
│   313 │   │   logger.info(f"Creating split msms.txt file {output_file}")                         │
│ ❱ 314 │   │   grouped_search_results.get_group(filename).to_csv(output_file)                     │
│   315                                                                                            │
│   316                                                                                            │
│   317 def merge_spectra_and_peptides(spectra: pd.DataFrame, search: pd.DataFrame) -> Spectra:    │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/pandas/core/groupby/groupby.py:817 in │
│ get_group                                                                                        │
│                                                                                                  │
│    814 │   │                                                                                     │
│    815 │   │   inds = self._get_index(name)                                                      │
│    816 │   │   if not len(inds):                                                                 │
│ ❱  817 │   │   │   raise KeyError(name)                                                          │
│    818 │   │                                                                                     │
│    819 │   │   return obj._take_with_is_copy(inds, axis=self.axis)                               │
│    820                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_01_uncalibrated'

Not sure if the Oktoberfest expects that .raw files and .mzML files are the exact same base file name (except the very last .postfix). FragPipe adds this _uncalibrated tag:

ls -lh /scratch/cpanse/PXD028735/ddaPASEF/
total 150G
-rw-rw-r--+ 1 tobiasko SG_Employees 2.9K Jul 11 10:08 ddaPASEF.fp-manifest
-rw-rw-r--+ 1 tobiasko SG_Employees  11K Jul 14 09:21 Default_zero_Oktoberfest.workflow
-rw-rw-r--+ 1 tobiasko SG_Employees  11K Jul 12 11:12 Default_zero.workflow
drwxrwxr-x+ 1 tobiasko SG_Employees   36 Jul 18 09:55 FragPipeOutput
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:42 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_01.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 09:41 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_01.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 09:42 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_01_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:50 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_02.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 10:14 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_02.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 10:15 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_02_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:51 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_03.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 10:51 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_03.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.1G Jul 14 10:52 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_03_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:51 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_01.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 11:47 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_01.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.1G Jul 14 11:48 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_01_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:52 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_02.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 12:04 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_02.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.1G Jul 14 12:04 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_02_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 16:08 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_03.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 12:19 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_03.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 12:20 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_03_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:53 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Gamma_01.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 12:39 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Gamma_01.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.1G Jul 14 12:40 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Gamma_01_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:54 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Gamma_02.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 12:58 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Gamma_02.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 12:58 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Gamma_02_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 16:10 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Gamma_03.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 13:31 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Gamma_03.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.1G Jul 14 13:33 LFQ_timsTOFPro_PASEF_Condition_A_Sample_Gamma_03_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:54 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Alpha_01.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 15:05 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Alpha_01.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 15:08 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Alpha_01_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:55 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Alpha_02.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 16:12 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Alpha_02.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 16:13 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Alpha_02_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:55 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Alpha_03.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 16:37 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Alpha_03.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 16:38 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Alpha_03_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 16:13 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Beta_01.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 16:54 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Beta_01.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.1G Jul 14 16:55 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Beta_01_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:57 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Beta_02.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 17:08 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Beta_02.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 17:08 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Beta_02_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:57 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Beta_03.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 17:22 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Beta_03.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 17:22 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Beta_03_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:58 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Gamma_01.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 17:36 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Gamma_01.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 17:37 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Gamma_01_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:58 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Gamma_02.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 17:49 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Gamma_02.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 17:49 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Gamma_02_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:59 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Gamma_03.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 18:01 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Gamma_03.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 18:02 LFQ_timsTOFPro_PASEF_Condition_B_Sample_Gamma_03_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 09:59 LFQ_timsTOFPro_PASEF_Ecoli_01.d
-rw-rw-r--+ 1 tobiasko SG_Employees 874M Jul 14 18:11 LFQ_timsTOFPro_PASEF_Ecoli_01.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 2.0G Jul 14 18:11 LFQ_timsTOFPro_PASEF_Ecoli_01_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:00 LFQ_timsTOFPro_PASEF_Ecoli_02.d
-rw-rw-r--+ 1 tobiasko SG_Employees 909M Jul 14 18:20 LFQ_timsTOFPro_PASEF_Ecoli_02.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 2.1G Jul 14 18:21 LFQ_timsTOFPro_PASEF_Ecoli_02_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:00 LFQ_timsTOFPro_PASEF_Ecoli_03.d
-rw-rw-r--+ 1 tobiasko SG_Employees 901M Jul 14 18:30 LFQ_timsTOFPro_PASEF_Ecoli_03.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 2.1G Jul 14 18:30 LFQ_timsTOFPro_PASEF_Ecoli_03_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:00 LFQ_timsTOFPro_PASEF_Human_01.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 18:43 LFQ_timsTOFPro_PASEF_Human_01.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.1G Jul 14 18:43 LFQ_timsTOFPro_PASEF_Human_01_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:01 LFQ_timsTOFPro_PASEF_Human_02.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 18:56 LFQ_timsTOFPro_PASEF_Human_02.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.1G Jul 14 18:57 LFQ_timsTOFPro_PASEF_Human_02_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 16:15 LFQ_timsTOFPro_PASEF_Human_03.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 19:10 LFQ_timsTOFPro_PASEF_Human_03.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.1G Jul 14 19:11 LFQ_timsTOFPro_PASEF_Human_03_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:02 LFQ_timsTOFPro_PASEF_QC_01.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 19:24 LFQ_timsTOFPro_PASEF_QC_01.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 19:25 LFQ_timsTOFPro_PASEF_QC_01_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:02 LFQ_timsTOFPro_PASEF_QC_02.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 19:38 LFQ_timsTOFPro_PASEF_QC_02.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 19:39 LFQ_timsTOFPro_PASEF_QC_02_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:03 LFQ_timsTOFPro_PASEF_QC_03.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 19:52 LFQ_timsTOFPro_PASEF_QC_03.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 19:53 LFQ_timsTOFPro_PASEF_QC_03_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:03 LFQ_timsTOFPro_PASEF_QC_04.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 20:06 LFQ_timsTOFPro_PASEF_QC_04.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.1G Jul 14 20:07 LFQ_timsTOFPro_PASEF_QC_04_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:04 LFQ_timsTOFPro_PASEF_QC_05.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 20:20 LFQ_timsTOFPro_PASEF_QC_05.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 20:20 LFQ_timsTOFPro_PASEF_QC_05_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:05 LFQ_timsTOFPro_PASEF_QC_06.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 20:33 LFQ_timsTOFPro_PASEF_QC_06.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 20:33 LFQ_timsTOFPro_PASEF_QC_06_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:05 LFQ_timsTOFPro_PASEF_QC_07.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.5G Jul 14 20:46 LFQ_timsTOFPro_PASEF_QC_07.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 3.0G Jul 14 20:46 LFQ_timsTOFPro_PASEF_QC_07_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 16:16 LFQ_timsTOFPro_PASEF_QC_08.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.3G Jul 14 20:59 LFQ_timsTOFPro_PASEF_QC_08.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 2.9G Jul 14 20:59 LFQ_timsTOFPro_PASEF_QC_08_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:06 LFQ_timsTOFPro_PASEF_QC_09.d
-rw-rw-r--+ 1 tobiasko SG_Employees 200M Jul 14 21:04 LFQ_timsTOFPro_PASEF_QC_09.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 692M Jul 14 21:04 LFQ_timsTOFPro_PASEF_QC_09_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:06 LFQ_timsTOFPro_PASEF_Yeast_01.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.3G Jul 14 21:16 LFQ_timsTOFPro_PASEF_Yeast_01.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 2.9G Jul 14 21:17 LFQ_timsTOFPro_PASEF_Yeast_01_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:07 LFQ_timsTOFPro_PASEF_Yeast_02.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.3G Jul 14 21:29 LFQ_timsTOFPro_PASEF_Yeast_02.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 2.9G Jul 14 21:30 LFQ_timsTOFPro_PASEF_Yeast_02_uncalibrated.mzML
drwxrwxr-x+ 1 tobiasko SG_Employees   56 Jul  7 10:07 LFQ_timsTOFPro_PASEF_Yeast_03.d
-rw-rw-r--+ 1 tobiasko SG_Employees 1.3G Jul 14 21:43 LFQ_timsTOFPro_PASEF_Yeast_03.mzBIN
-rw-rw-r--+ 1 tobiasko SG_Employees 2.8G Jul 14 21:44 LFQ_timsTOFPro_PASEF_Yeast_03_uncalibrated.mzML
-rwxrw-r--+ 1 tobiasko SG_Employees  233 Jul 14 09:26 runfragpipe.bash
picciama commented 11 months ago

The filename without the extension has to match. I don't know why they add _uncalibrated but then they should also add this suffix to the search results. We cannot possibly know how to match arbitrary filename manipulations and I suggest that fragpipe is fixing this on their side. For now, you would need to correct the filenames. I could maybe add a check if "_uncalibrated" exists as long as this is always the case. But it gets difficult if every tool changes the filename somehow.

tobiasko commented 11 months ago

Jip! I totally agree, not their best idea... could I also use a hard link for this purpose?

picciama commented 11 months ago

Jip! I totally agree, not their best idea... could I also use a hard link for this purpose?

Yes Oktoberfest supports links. I would rather use a symlink though, i.e. ln -s LFQ_timsTOFPro_PASEF_Yeast_02_uncalibrated.mzML LFQ_timsTOFPro_PASEF_Yeast_02_uncalibrated.mzML for example.

tobiasko commented 11 months ago

hmmm, I think spectrum_io has some type of problem with the symlinks:

python3 -m oktoberfest --config_path ~/CEcalibration_config.json
2023-10-09 15:36:57,466 - INFO - oktoberfest::main Oktoberfest version 0.5.0
Copyright 2023, Wilhelmlab at Technical University of Munich
2023-10-09 15:36:57,468 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-10-09 15:36:57,469 - INFO - oktoberfest.utils.config::read Reading configuration from /home/tobiasko/CEcalibration_config.json
2023-10-09 15:36:57,470 - INFO - oktoberfest.runner::run_ce_calibration Found 36 files in the spectra directory.
2023-10-09 15:36:57,470 - INFO - oktoberfest.utils.process_step::is_done Skipping preprocessing_search step because /scratch/cpanse/PXD028735/ddaPASEF/FragPipeOutput/20230714_0922/proc/preprocessing_search.done was found.
Waiting for tasks to complete:   0%|                                                                                                | 0/36 [00:00<?, ?it/s]2023-10-09 15:36:57,871 - INFO - spectrum_io.raw.msraw::read_mzml Reading mzML file: /scratch/cpanse/PXD028735/ddaPASEF/links/LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_01.mzML
2023-10-09 15:36:57,873 - INFO - spectrum_io.raw.msraw::read_mzml Reading mzML file: /scratch/cpanse/PXD028735/ddaPASEF/links/LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_02.mzML
2023-10-09 15:36:57,877 - INFO - spectrum_io.raw.msraw::read_mzml Reading mzML file: /scratch/cpanse/PXD028735/ddaPASEF/links/LFQ_timsTOFPro_PASEF_Condition_A_Sample_Alpha_03.mzML
Waiting for tasks to complete:   0%|                                                                                                | 0/36 [00:00<?, ?it/s]
2023-10-09 15:36:57,881 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool Caught Unknown exception, terminating workers
2023-10-09 15:36:57,881 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool Caught Unknown exception, terminating workers
2023-10-09 15:36:57,881 - INFO - spectrum_io.raw.msraw::read_mzml Reading mzML file: /scratch/cpanse/PXD028735/ddaPASEF/links/LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_01.mzML
2023-10-09 15:36:57,882 - INFO - spectrum_io.raw.msraw::read_mzml Reading mzML file: /scratch/cpanse/PXD028735/ddaPASEF/links/LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_02.mzML
2023-10-09 15:36:57,882 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 199, in _ce_calib
    library = _annotate_and_get_library(spectra_file, config)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 67, in _annotate_and_get_library
    spectra = pp.load_spectra(mzml_file)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/preprocessing/preprocessing.py", line 372, in load_spectra
    return ThermoRaw.read_mzml(
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 155, in read_mzml
    fragmentation = spec["scanList"]["scan"][0]["filter string"].split("@")[1][:3].upper()
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/utils/multiprocessing_pool.py", line 43, in check_pool
    outputs.append(res.get(timeout=10000))  # 10000 seconds = ~3 hours
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
IndexError: list index out of range

2023-10-09 15:36:57,882 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 199, in _ce_calib
    library = _annotate_and_get_library(spectra_file, config)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/runner.py", line 67, in _annotate_and_get_library
    spectra = pp.load_spectra(mzml_file)
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/preprocessing/preprocessing.py", line 372, in load_spectra
    return ThermoRaw.read_mzml(
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 155, in read_mzml
    fragmentation = spec["scanList"]["scan"][0]["filter string"].split("@")[1][:3].upper()
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/oktoberfest/utils/multiprocessing_pool.py", line 43, in check_pool
    outputs.append(res.get(timeout=10000))  # 10000 seconds = ~3 hours
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
IndexError: list index out of range

2023-10-09 15:36:57,882 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool list index out of range
2023-10-09 15:36:57,882 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool list index out of range
2023-10-09 15:36:57,885 - INFO - spectrum_io.raw.msraw::read_mzml Reading mzML file: /scratch/cpanse/PXD028735/ddaPASEF/links/LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_03.mzML

but

head /scratch/cpanse/PXD028735/ddaPASEF/links/LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_03.mzML
<?xml version='1.0' encoding='UTF-8'?>
<indexedmzML xmlns="http://psi.hupo.org/ms/mzml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://psi.hupo.org/ms/mzml http://psidev.info/files/ms/mzML/xsd/mzML1.1.2_idx.xsd">
  <mzML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="LFQ_timsTOFPro_PASEF_Condition_A_Sample_Beta_03.d" xsi:schemaLocation="http://psi.hupo.org/ms/mzml http://psidev.info/files/ms/mzML/xsd/mzML1.1.2_idx.xsd">
    <cvList count="2">
      <cv id="MS" fullName="Proteomics Standards Initiative Mass Spectrometry Ontology" version="4.1.103" URI="https://raw.githubusercontent.com/HUPO-PSI/psi-ms-CV/master/psi-ms.obo"/>
      <cv id="UO" fullName="Unit Ontology" version="09:04:2014" URI="https://raw.githubusercontent.com/bio-ontology-research-group/unit-ontology/master/unit.obo"/>
    </cvList>
    <fileDescription>
      <fileContent>
        <cvParam cvRef="MS" accession="MS:1000579" name="MS1 spectrum" value=""/>
picciama commented 11 months ago

This is not because of symlinks but because of the "filter string" accession, which is not present. This is used in spectrum-io to determine the fragmentation type (supported are HCD and CID at the moment) as well as the mz range of the spectrum. The problem is, that this is an accession unique to thermo instruments it seems. I.e. it is not there all the time but we need it for annotation of fragment peaks.

@WassimG do you have an idea how to do this better? It seems HUPO PSI made the terms HCD/CID obsolete and suggests a different accession, which to me doesn't even make sense: https://raw.githubusercontent.com/HUPO-PSI/psi-ms-CV/master/psi-ms.obo (search for HCD). We need to get the information in a different way if it isn't thermo data.

tobiasko commented 11 months ago

ok. I loooooove mzML! So it is this (searched for HCD):

[Term]
id: MS:1000422
name: beam-type collision-induced dissociation
def: "A collision-induced dissociation process that occurs in a beam-type collision cell." [PSI:MS]
synonym: "HCD" EXACT []
is_a: MS:1000133 ! collision-induced dissociation

But why obsolete?

picciama commented 11 months ago

I don't know, this is just sth. they say in the mzml documentation. I will have to check the mzML to see which accessions are used to define the scan window and fragmentation type and will come back to you once I know how to solve this.

tobiasko commented 10 months ago

But why are you so keen on checking that the scans are actually of the HCD/CID fragmentation type? I kind of understood this when the code was still sitting behind Prosit - which only had HCD/CID models trained on Orbitrap data, but now that all kinds of models could become available through Koina... or maybe someone wants to score EtHCD data vs. a model trained on CID data? So in essence, is this check really needed? Maybe, just check if the scan is a fragment ion scan (guess that is the ms level in mzML) and place a warning if the fragmentation method indicated in the scan metadata mismatches the selected model, but even that matching might be difficult to do.

picciama commented 10 months ago

I sort of agree. This is now an issue that is more about do we care about the fragmentation type and FTMS/ITMS/TOF but instead let the user do this. We really only need the scan window and let the user provide the desired mass tolerance used for the search which is already supported in the config. We realised that mzml converted using MSConvert is actually not working at all. Still looking into this in hopes of finding a solution.

tobiasko commented 10 months ago

ok. crazy! Not at all? But the mzML files written by FragPipe work?

picciama commented 10 months ago

Yes, because you were able to read the information from the filter string attribute, which is not there all the time. What should be there is the scanWindowList and the activation attribute within the precursorList.

I pushed to https://github.com/wilhelm-lab/spectrum_io/pull/75/commits/4a60a9ce8c31b5ecacc5a61b5c386765eff58d62 which should fix your problem. I removed the dependency on the filter string attribute. You should pip uninstall spectrum-io, then pip install git+https://github.com/wilhelm-lab/spectrum_io.git@fix/mzml_instrumentConfigurationRef

I added some unit tests and they work but please check this before I merge it.

tobiasko commented 8 months ago

Hi there,

not sure if this is a new problem, or still the old one. I started a CE calibration:

cat CollisionEnergyCalibration_2024_01_03-16_07_20.log
2024-01-03 16:07:20,052 - INFO - oktoberfest.runner::run_job Oktoberfest version 0.5.2
Copyright 2024, Wilhelmlab at Technical University of Munich
2024-01-03 16:07:20,052 - INFO - oktoberfest.runner::run_job Job executed with the following config:
2024-01-03 16:07:20,052 - INFO - oktoberfest.runner::run_job {
    "type": "CollisionEnergyCalibration",
    "tag": "",
    "output": "/scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/",
    "inputs": {
        "search_results_type": "Msfragger",
        "spectra": "/scratch/cpanse/PXD028735/dda/",
        "spectra_type": "mzml",
        "search_results": "/scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/"
    },
    "models": {
        "intensity": "Prosit_2020_intensity_HCD",
        "irt": "Prosit_2019_irt"
    },
    "prediction_server": "koina.proteomicsdb.org:443",
    "numThreads": 1,
    "regressionMethod": "spline",
    "ssl": true,
    "thermoExe": "ThermoRawFileParser.exe",
    "massTolerance": 20,
    "unitMassTolerance": "ppm",
    "ce_alignment_options": {
        "ce_range": [
            19,
            50
        ],
        "use_ransac_model": false
    }
}
2024-01-03 16:07:20,052 - INFO - oktoberfest.utils.config::read Reading configuration from CEcalibration_Prosit_2020_intensity_HCD.json
2024-01-03 16:07:20,053 - INFO - oktoberfest.runner::run_ce_calibration Found 45 files in the spectra directory.
2024-01-03 16:07:20,053 - INFO - oktoberfest.runner::_preprocess Converting search results from /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912 to internal search result.
2024-01-03 16:34:45,737 - INFO - spectrum_io.search_result.search_results::filter_valid_prosit_sequences #sequences before filtering for valid prosit sequences: 4599403
2024-01-03 16:34:52,716 - INFO - spectrum_io.search_result.search_results::filter_valid_prosit_sequences #sequences after filtering for valid prosit sequences: 4479334
2024-01-03 16:35:26,613 - INFO - oktoberfest.runner::_preprocess Read 4479334 PSMs from /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/msms.prosit
2024-01-03 16:35:31,779 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.rescore
2024-01-03 16:35:32,492 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_04.rescore
2024-01-03 16:35:32,913 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_04.rescore
2024-01-03 16:35:33,336 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_02.rescore
2024-01-03 16:35:33,762 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_02.rescore
2024-01-03 16:35:34,196 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.rescore
2024-01-03 16:35:34,596 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_01.rescore
2024-01-03 16:35:35,007 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.rescore
2024-01-03 16:35:35,404 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.rescore
2024-01-03 16:35:35,817 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_03.rescore
2024-01-03 16:35:36,227 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.rescore
2024-01-03 16:35:36,660 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_04.rescore
2024-01-03 16:35:37,089 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Gamma_03.rescore
2024-01-03 16:35:37,517 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_A_Sample_Beta_01.rescore
2024-01-03 16:35:37,940 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.rescore
2024-01-03 16:35:38,369 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_04.rescore
2024-01-03 16:35:38,790 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_01.rescore
2024-01-03 16:35:39,286 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_02.rescore
2024-01-03 16:35:39,713 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_03.rescore
2024-01-03 16:35:40,127 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_B_Sample_Beta_04.rescore
2024-01-03 16:35:40,546 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_01.rescore
2024-01-03 16:35:40,953 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_02.rescore
2024-01-03 16:35:41,388 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_04.rescore
2024-01-03 16:35:41,806 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Condition_B_Sample_Gamma_03.rescore
2024-01-03 16:35:42,229 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Ecoli_02.rescore
2024-01-03 16:35:42,374 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Ecoli_01.rescore
2024-01-03 16:35:42,540 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Ecoli_03.rescore
2024-01-03 16:35:42,696 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Human_01.rescore
2024-01-03 16:35:43,100 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Human_02.rescore
2024-01-03 16:35:43,491 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Human_03.rescore
2024-01-03 16:35:43,880 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_QC_01.rescore
2024-01-03 16:35:44,251 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_QC_02.rescore
2024-01-03 16:35:44,640 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_QC_03.rescore
2024-01-03 16:35:45,037 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_QC_04.rescore
2024-01-03 16:35:45,422 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_QC_05.rescore
2024-01-03 16:35:45,817 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_QC_06.rescore
2024-01-03 16:35:46,217 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_QC_07.rescore
2024-01-03 16:35:46,620 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_QC_08.rescore
2024-01-03 16:35:47,024 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_QC_09.rescore
2024-01-03 16:35:47,417 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_QC_10.rescore
2024-01-03 16:35:47,815 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_QC_11.rescore
2024-01-03 16:35:48,212 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_QC_12.rescore
2024-01-03 16:35:48,604 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Yeast_01.rescore
2024-01-03 16:35:48,947 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Yeast_02.rescore
2024-01-03 16:35:49,280 - INFO - oktoberfest.preprocessing.preprocessing::split_search Creating split msms.txt file /scratch/cpanse/PXD028735/dda/FragPipeOutput/20230714_0912/out/msms/LFQ_Orbitrap_DDA_Yeast_03.rescore
2024-01-03 16:35:49,774 - INFO - spectrum_io.raw.msraw::_read_mzml_pyteomics Reading mzML file: /scratch/cpanse/PXD028735/dda/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML

and it fails when Oktoberfest starts reading from the first mzML files:


│ ❱   84 │   │   │   return func(self, *args, **kwargs)                                            │
│     85 │   │   finally:                                                                          │
│     86 │   │   │   self.seek(position)                                                           │
│     87 │   return wrapped                                                                        │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/pyteomics/xml.py:1150 in get_by_id    │
│                                                                                                  │
│   1147 │   │   │   │   id_key = self._indexed_tag_keys.get(element_type)                         │
│   1148 │   │   │   elem = self._find_by_id_no_reset(elem_id, id_key=id_key)                      │
│   1149 │   │   except (KeyError, AttributeError, etree.LxmlError):                               │
│ ❱ 1150 │   │   │   elem = self._find_by_id_reset(elem_id, id_key=id_key)                         │
│   1151 │   │   data = self._get_info_smart(elem, **kwargs)                                       │
│   1152 │   │   return data                                                                       │
│   1153                                                                                           │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/pyteomics/auxiliary/file_helpers.py:8 │
│ 4 in wrapped                                                                                     │
│                                                                                                  │
│     81 │   │   position = self.tell()                                                            │
│     82 │   │   self.seek(0)                                                                      │
│     83 │   │   try:                                                                              │
│ ❱   84 │   │   │   return func(self, *args, **kwargs)                                            │
│     85 │   │   finally:                                                                          │
│     86 │   │   │   self.seek(position)                                                           │
│     87 │   return wrapped                                                                        │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/pyteomics/xml.py:1117 in              │
│ _find_by_id_reset                                                                                │
│                                                                                                  │
│   1114 │                                                                                         │
│   1115 │   @_keepstate                                                                           │
│   1116 │   def _find_by_id_reset(self, elem_id, id_key=None):                                    │
│ ❱ 1117 │   │   return self._find_by_id_no_reset(elem_id, id_key=id_key)                          │
│   1118 │                                                                                         │
│   1119 │   @_keepstate                                                                           │
│   1120 │   def get_by_id(self, elem_id, id_key=None, element_type=None, **kwargs):               │
│                                                                                                  │
│ /home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/pyteomics/xml.py:661 in               │
│ _find_by_id_no_reset                                                                             │
│                                                                                                  │
│    658 │   │   │   │   │   return elem                                                           │
│    659 │   │   │   │   if not found:                                                             │
│    660 │   │   │   │   │   elem.clear()                                                          │
│ ❱  661 │   │   raise KeyError(elem_id)                                                           │
│    662 │                                                                                         │
│    663 │   @_keepstate                                                                           │
│    664 │   def get_by_id(self, elem_id, **kwargs):                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'commonInstrumentParams'

BUT, this has worked in a previous attempt.

picciama commented 8 months ago

It's a new one. In order to support different models in koina which require the instrument type to be read from the mzml file, we introduced a new column in the internal format that contains this information. This is already implemented in the latests version and works with the mzML files we tested.

I checked the file LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML which I still had and found that MSConvert writes "CommonInstrumentParams", i.e. capital "C" compared to ThermoRawFileParser which writes lowercase. I will fix this asap.

If it isn't too many files you can therefore manually change the mzML files for now if you want...

tobiasko commented 8 months ago

ok, thx for the fast reply. I will wait for the fix. The last thing we want is to create additional confusion by introducing manual changes in the .mzML files. I 😍 .mzML

picciama commented 8 months ago

@tobiasko Reading the instrumentConfiguration from mzML files that were converted with MSConvert is now working with the current development branch of oktoberfest but not with the stable release, since the newest spectrum-io isn't supported by the current stable version of oktoberfest due to a breaking change.

If sth. else doesn't work with regards to reading the mzML, please consider opening a new issue.