wfondrie / depthcharge

A deep learning toolkit for mass spectrometry
https://wfondrie.github.io/depthcharge/
Apache License 2.0
59 stars 18 forks source link

Don't crash on missing spectrum info #20

Closed bittremieux closed 1 year ago

bittremieux commented 1 year ago

Don't crash when some information for a spectrum (precursor charge) is missing. Fixes casanovo#50.

This PR fixes crashes while reading spectra in two fashions:

  1. Set a default of 0 as the precursor charge from MGF files.
  2. Catch exceptions that are thrown during reading and report how many spectra were invalid and thus skipped.

Probably only a single of these solutions suffices. Prior to looking at the code I thought the second solution would be the way to go. However, the code already allowed for a missing precursor charge from mzML and mzXML files, just not from MGF files; so I fixed that. However, what does Casanovo do with a spectrum that has 0 precursor charge? I don't actually know. The fact that it rarely crashed before indicates that this doesn't happen often. But because it hasn't encountered any spectra with 0 precursor charge during training, its behavior might be undefined. So maybe those spectra should actually be skipped instead, rather than use a default precursor charge?

Implementation note: I now add the precursor m/z and precursor charge to their respective lists at the end, to make sure that no partial entries incorrectly remain when spectrum reading fails for a subsequent attribute.

bittremieux commented 1 year ago

@wfondrie I've made the changes as discussed during the call, so the PR is ready for review.

codecov-commenter commented 1 year ago

Codecov Report

Merging #20 (e33cab5) into main (cc8b133) will increase coverage by 0.35%. The diff coverage is 96.72%.

@@            Coverage Diff             @@
##             main      #20      +/-   ##
==========================================
+ Coverage   86.64%   87.00%   +0.35%     
==========================================
  Files          19       19              
  Lines         831      854      +23     
==========================================
+ Hits          720      743      +23     
  Misses        111      111              
Impacted Files Coverage Δ
depthcharge/data/parsers.py 94.30% <96.22%> (+0.97%) :arrow_up:
depthcharge/data/hdf5.py 93.44% <100.00%> (+0.18%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more