sneumann / Rdisop

This is the git repository matching the Bioconductor package Rdisop: Decomposition of Isotopic Patterns
4 stars 9 forks source link

decomposeMass: element filter not applied #6

Closed cbielow closed 5 years ago

cbielow commented 8 years ago

take the following example:

decomposeMass(269.2431, ppm = 5, minElements = "C3", maxElements = "S0C6", elements = "CHNOP")$formula
[1] "H48NO5PS3"    "C6H34N6O3P"   "C3H43O8P2"    "C4H45O5S3"    "C3H39N7S3"    "C3H31N11OS"   "C4H37N4O6S"   "C5H162NS"    
 [9] "H40N5O6PS"    "C6H42N2O2PS2" "C5H49S5"      "C6H44N2P3S"   "C3H48NOP5"    "C4H47O3P2S2"  "C3H41O10S"    "C5H41N4OS3"  
[17] "H35N10O2P2"   "H50NO3P3S2"   "C5H38N2O7P"   "C4H32N9O2P"  

the following problems occur:

mjhelf commented 5 years ago

This is an old issue, but I am kind of interested in this ( I'm playing with some post-Rdisop filtering options in a small package here, but Rdisop/C++ based pre-filtering would be helpful)

The first part of this issue can be solved by supplying the elements in the expected format, without sulfur:

decomposeMass(269.2431, ppm = 5, minElements = "C3", maxElements = "C6", elements = initializeElements(c("C","H","N","O","P")))$formula

[1] "C6H34N6O3P" "C3H43O8P2"  "C3H48NOP5"  "H35N10O2P2" "C5H38N2O7P" "C4H32N9O2P"

I think what is a bit confusing is that Rdisop doesn't complain about misformated element input.. It would help to add a warning in this if block that the elements were not formatted correctly and now the default CHNOPS is used: https://github.com/mjhelf/Rdisop/blob/3d8a66ec4488600019e6635d4f1cf049060c923b/R/Rdisop.R#L145

Note that (unnecessarily) leaving S0 in the maxElements will now throw an error:

Error in decomposeIsotopes(c(mass), c(1), ppm = ppm, mzabs = mzabs, elements = elements,  : 
  S was not found in alphabet!

About the second part:

I think this loop iterates only over elements that exist in the predicted molecule, and so it does not catch the instances where there are no Carbons:

https://github.com/sneumann/Rdisop/blob/3d8a66ec4488600019e6635d4f1cf049060c923b/src/disop.cpp#L122

I guess one way of dealing with this is to split it into two loops, one of them iterating over molecule.getElements() as before, but only checking if (maxcount>0 && count > maxcount) {return false;} and then a second one iterating over minElements.getElements(), checking only if(count < mincount){return false;} .. not sure if this is a problem if there are no minElements defined.

bool isWithinElementRange(const ComposedElement& molecule, const ComposedElement& minElements, const ComposedElement& maxElements) {
// {{{

  for (ComposedElement::container::const_iterator it = molecule.getElements().begin(); 
       it != molecule.getElements().end(); ++it) {

    int mincount = static_cast<int>(minElements.getElementAbundance((it->first).getName()));
    int maxcount = static_cast<int>(maxElements.getElementAbundance((it->first).getName()));

    int count = static_cast<int>(molecule.getElementAbundance((it->first).getName()));

      if (count < mincount) {
    return false;
      }

      // TODO: Fails e.g. for "C2N0" 
      if (maxcount>0 && count > maxcount) {
    return false;
      }
  }

  return true;
}

This maxcount > 0 is probably also why setting 'S0' as a limit doesn't work, but not sure if bad things will happen if that gets removed: https://github.com/sneumann/Rdisop/blob/3d8a66ec4488600019e6635d4f1cf049060c923b/src/disop.cpp#L135

@sneumann let me know if I should make a PR for this..

sneumann commented 5 years ago

Hi, PR would be highly welcome. Can I ask that it includes a unit test ? Yours, Steffen

sneumann commented 5 years ago

Should be fixed in 1.44.1 , Thanks Max for the PR. Yours, Steffen