mandelbrot-project / spectral_lib_matcher

A script based on the matchms library allowing to calculate spectral similarity measures between two mgf (usually a query file and a library file).
GNU General Public License v3.0
9 stars 0 forks source link

Update spectral_lib_matcher.py #5

Closed Adafede closed 3 years ago

Adafede commented 3 years ago

Multiple changes here:

  1. (initial wanted change:) The output now keeps the original 'feature_id' (could also be scan if needed) instead of creating new ones. This allows using MGF where feature_ids are not renumbered (as for IIN for example).
  2. -c argument added. It allows skipping cleaning steps if wanted. The only one that needs to be kept is 'add_precursor_mz' (implemented as 'minimal_processing'). It sadly needs to be kept because of the conversion to float, could not manage saving an MGF with precursor_mz added and reading it as a float. A bit suboptimal so, but allows >1min gain each time. 👍🏼
Adafede commented 3 years ago

Oh, forgot to mention, there are also:

  1. removing useless DB IDs and row indices in output
  2. renaming 'inchikey' column to 'short_inchikey', as it is the latter.
bjonnh commented 3 years ago

Ideally, those changes should be in separate commits. That makes things such as bisect much easier in case we need to find the source of a bug. I'll check the code now.