phenomecentre / nPYc-Toolbox

The nPYc-Toolbox defines objects for representing, and implements functions to manipulate and display, metabolic profiling datasets.
MIT License
25 stars 8 forks source link

Split NMR Targeted from LC-QqQ Targeted objects #34

Open Gscorreia89 opened 5 years ago

Gscorreia89 commented 5 years ago

The features for import and QC of both targeted NMR (Bruker ivdr methods) and LC-QqQ MS assays are using the same general TargetedDataset. However, NMR Targeted methods are conceptually very simple, while LC-QqQ require a set of specific extra attributes. Maintaining both features in a single object is making modification of targeted QC features much harder to debug and improve, so these should be split to different specific TargetedDataset objects (which might or not inherit from an abstract Targeted Dataset object).

Gscorreia89 commented 3 years ago

First part of this is being handled in this branch: https://github.com/phenomecentre/nPYc-Toolbox/tree/feature/abstractTargetedDataset

A new "abstract" class, AbstractTargetedDataset (inherits from Dataset, and which will be renamed in the future as Targeted Dataset once the refactor is complete) will be added first. The future classes NMRTargetedDataset and MSTargetedDatasets will inherit from it. The current TargetedDataset object will be kept throughout the process until the end to ensure we don't lose functionality during the refactor.

gordondavies commented 1 year ago

Hi @Gscorreia89 - I've fixed some failing unit tests in develop, and merged that into this branch, but there are still some failing unit tests. One of them in particular is a bit strange so I was wondering if you had any thoughts on it?

I see in the branch you have added these lines to Dataset.validateObject:

## self.VariableType is a enum VariableType
    condition = isinstance(self.VariableType, VariableType)
    success = 'Check self.VariableType is an enum \'VariableType\':\tOK'
    failure = 'Check self.VariableType is an enum \'VariableType\':\tFailure, \'self.VariableType\' is' + str(
            type(self.VariableType))
    failureListBasic = conditionTest(condition, success, failure,
                            failureList, verbose, raiseError,
                            raiseWarning,
                            exception=TypeError(failure))

However there is a unit test (test_dataset.testValidateObject) which initialises an empty Dataset object and then runs the validation subtest. However, an empty Dataset object has self.VariableType = None in the constructor, so this code you added in this branch will always fail.

Do you have any thoughts on this?

Cheers!

Gordon

gordondavies commented 1 year ago

Here's another strange one:

    def _loadBrukerNMRTargeted(self, datapath, unit=None, pdata=1, fileNamePattern=None, **kwargs):
        """
        Import a dataset from Bruker IvDr .xml files.
        :param datapath:
        :param unit:
        :param pdata:
        :param fileNamePattern:
        :param kwargs:
        :return:
        """

        if not isinstance(fileNamePattern, str):
            raise TypeError('\'fileNamePattern\' must be a string')

The fileNamePattern is an optional argument, but if its not set it will throw an error?

Failing in test_NMRTargetedDataset.test_loadBrukerXMLDataset()


        with self.subTest(msg='Basic import BrukerQuant-UR with implicit fileNamePattern from SOP'):
            expected = copy.deepcopy(self.expectedQuantUR)
            # Generate
            result = nPYc.NMRTargetedDataset(self.datapathQuantUR, fileType='Bruker IvDR', sop='BrukerQuant-UR',
                                          unit='mmol/mol Crea')

You can see its not passing the fileNamePattern, so its failing?

gordondavies commented 1 year ago

Commenting for reference:


======================================================================
ERROR: test_validateObject (test_dataset.test_dataset_synthetic) [self.VariableType does not exist]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_dataset.py", line 516, in test_validateObject
    self.assertFalse(badDataset.validateObject(verbose=False, raiseError=False, raiseWarning=False))
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_dataset.py", line 556, in validateObject
    condition = isinstance(self.VariableType, VariableType)
AttributeError: 'Dataset' object has no attribute 'VariableType'

======================================================================
ERROR: test_loadBrukerXMLDataset (test_nmrTargetedDataset.test_nmrtargeteddataset_full_brukerxml_load) [Basic import BrukerQuant-UR with implicit fileNamePattern from SOP]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_nmrTargetedDataset.py", line 1253, in test_loadBrukerXMLDataset
    result = nPYc.NMRTargetedDataset(self.datapathQuantUR, fileType='Bruker IvDR', sop='BrukerQuant-UR',
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_nmrTargetedDataset.py", line 28, in __init__
    self._loadBrukerNMRTargeted(datapath, sop=sop, **kwargs)
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_nmrTargetedDataset.py", line 140, in _loadBrukerNMRTargeted
    raise TypeError('\'fileNamePattern\' must be a string')
TypeError: 'fileNamePattern' must be a string

======================================================================
ERROR: test_loadBrukerXMLDataset_warnDuplicates (test_nmrTargetedDataset.test_nmrtargeteddataset_full_brukerxml_load) [Import duplicated features (BI-LISA), Raises warning if features are duplicated]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_nmrTargetedDataset.py", line 1319, in test_loadBrukerXMLDataset_warnDuplicates
    result.calibration['calibSampleMetadata'].drop(['Path'], axis=1, inplace=True)
AttributeError: 'NMRTargetedDataset' object has no attribute 'calibration'

======================================================================
ERROR: test_loadlims (test_nmrTargetedDataset.test_nmrtargeteddataset_full_brukerxml_load) [UnitTest1]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_nmrTargetedDataset.py", line 1406, in test_loadlims
    dataset.intensityData = dataset.intensityData[sortIndex, :]
AttributeError: can't set attribute

======================================================================
ERROR: test_loadlims (test_nmrTargetedDataset.test_nmrtargeteddataset_full_brukerxml_load) [UnitTest3]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_nmrTargetedDataset.py", line 1435, in test_loadlims
    dataset.intensityData = dataset.intensityData[sortIndex, :]
AttributeError: can't set attribute

======================================================================
ERROR: test_plotFeatureRanges (test_plotting.test_plotting)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/unittest/mock.py", line 1336, in patched
    return func(*newargs, **newkeywargs)
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_plotting.py", line 411, in test_plotFeatureRanges
    testData = nPYc.TargetedDataset(datapath, fileType='Bruker Quantification', sop='BrukerBI-LISA', fileNamePattern='.*?results\.xml$')
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
    self._loadBrukerXMLDataset(datapath, **kwargs)
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
    (self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)

======================================================================
ERROR: test_plotFeatureRanges_logscale (test_plotting.test_plotting)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/unittest/mock.py", line 1336, in patched
    return func(*newargs, **newkeywargs)
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_plotting.py", line 442, in test_plotFeatureRanges_logscale
    testData = nPYc.TargetedDataset(datapath, fileType='Bruker Quantification', sop='BrukerQuant-UR', fileNamePattern='.*?urine_quant_report_b\.xml$', unit='mmol/L')
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
    self._loadBrukerXMLDataset(datapath, **kwargs)
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
    (self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)

======================================================================
ERROR: test_loadBrukerXMLDataset (test_targeteddataset.test_targeteddataset_full_brukerxml_load) [Basic import BrukerQuant-UR with matching fileNamePattern]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 4835, in test_loadBrukerXMLDataset
    result = nPYc.TargetedDataset(self.datapathQuantUR, fileType='Bruker Quantification', sop='BrukerQuant-UR', fileNamePattern='.*?urine_quant_report_b\.xml$', unit='mmol/mol Crea')
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
    self._loadBrukerXMLDataset(datapath, **kwargs)
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
    (self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)

======================================================================
ERROR: test_loadBrukerXMLDataset (test_targeteddataset.test_targeteddataset_full_brukerxml_load) [Basic import BrukerQuant-UR with implicit fileNamePattern from SOP]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 4866, in test_loadBrukerXMLDataset
    result = nPYc.TargetedDataset(self.datapathQuantUR, fileType='Bruker Quantification', sop='BrukerQuant-UR', unit='mmol/mol Crea')
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
    self._loadBrukerXMLDataset(datapath, **kwargs)
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
    (self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)

======================================================================
ERROR: test_loadBrukerXMLDataset_warnDuplicates (test_targeteddataset.test_targeteddataset_full_brukerxml_load) [Import duplicated features (BI-LISA), Raises warning if features are duplicated]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 4907, in test_loadBrukerXMLDataset_warnDuplicates
    result = nPYc.TargetedDataset(self.datapathBILISA, fileType='Bruker Quantification', sop='BrukerBI-LISA', fileNamePattern='.*?results\.xml$')
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
    self._loadBrukerXMLDataset(datapath, **kwargs)
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
    (self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)

======================================================================
ERROR: test_loadlims (test_targeteddataset.test_targeteddataset_full_brukerxml_load) [UnitTest1]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 4962, in test_loadlims
    dataset = nPYc.TargetedDataset(self.datapathQuantUR, fileType='Bruker Quantification', sop='BrukerQuant-UR', fileNamePattern='.*?urine_quant_report_b\.xml$', unit='mmol/mol Crea')
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
    self._loadBrukerXMLDataset(datapath, **kwargs)
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
    (self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)

======================================================================
ERROR: test_loadlims (test_targeteddataset.test_targeteddataset_full_brukerxml_load) [UnitTest3]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 4989, in test_loadlims
    dataset = nPYc.TargetedDataset(self.datapathBILISA, fileType='Bruker Quantification', sop='BrukerBI-LISA', fileNamePattern='.*?results\.xml$')
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
    self._loadBrukerXMLDataset(datapath, **kwargs)
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
    (self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)

======================================================================
ERROR: test_utilities_importBrukerXML (test_utilities.test_utilities_read_bruker_xml)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_utilities.py", line 758, in test_utilities_importBrukerXML
    (intensityData, sampleMetadata, featureMetadata) = importBrukerXML(paths)
ValueError: too many values to unpack (expected 3)

======================================================================
ERROR: test_utilities_importBrukerXML_fails (test_utilities.test_utilities_read_bruker_xml)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_utilities.py", line 781, in test_utilities_importBrukerXML_fails
    (intensityData, sampleMetadata, featureMetadata) = importBrukerXML(paths)
ValueError: too many values to unpack (expected 3)

======================================================================
ERROR: test_utilities_readBrukerXML_warns (test_utilities.test_utilities_read_bruker_xml)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_utilities.py", line 718, in test_utilities_readBrukerXML_warns
    self.assertWarnsRegex(UserWarning, 'Error parsing xml in .+?, skipping', importBrukerXML, [tmpfile])
  File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/unittest/case.py", line 1301, in assertWarnsRegex
    return context.handle('assertWarnsRegex', args, kwargs)
  File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/unittest/case.py", line 201, in handle
    callable_obj(*args, **kwargs)
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/utilities/_importBrukerXML.py", line 84, in importBrukerXML
    return intensityData, sampleMetadata, featureMetadata, lodData
UnboundLocalError: local variable 'lodData' referenced before assignment

======================================================================
FAIL: test_validateObject (test_dataset.test_dataset_synthetic) [validateObject successful on empty Dataset]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_dataset.py", line 297, in test_validateObject
    self.assertTrue(goodDataset.validateObject(verbose=False, raiseError=False, raiseWarning=True))
AssertionError: False is not true

======================================================================
FAIL: test_validateObject (test_msdataset.test_msdataset_synthetic) [BasicMSDataset fails on empty MSDataset]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_msdataset.py", line 818, in test_validateObject
    self.assertEqual(badDataset.validateObject(verbose=False, raiseError=False, raiseWarning=False),
AssertionError: {'Dataset': False, 'BasicMSDataset': False, 'QC': Fal[23 chars]alse} != {'Dataset': True, 'BasicMSDataset': False, 'QC': Fals[22 chars]alse}
+ {'BasicMSDataset': False, 'Dataset': True, 'QC': False, 'sampleMetadata': False}
- {'BasicMSDataset': False,
-  'Dataset': False,
-  'QC': False,
-  'sampleMetadata': False}

======================================================================
FAIL: test_validateObject (test_msdataset.test_msdataset_synthetic) [if self.VariableType is not an enum VariableType]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_msdataset.py", line 1073, in test_validateObject
    self.assertEqual(badDataset.validateObject(verbose=False, raiseError=False, raiseWarning=False), {'Dataset': True, 'BasicMSDataset': False, 'QC': False, 'sampleMetadata': False})
AssertionError: {'Dataset': False, 'BasicMSDataset': False, 'QC': Fal[23 chars]alse} != {'Dataset': True, 'BasicMSDataset': False, 'QC': Fals[22 chars]alse}
+ {'BasicMSDataset': False, 'Dataset': True, 'QC': False, 'sampleMetadata': False}
- {'BasicMSDataset': False,
-  'Dataset': False,
-  'QC': False,
-  'sampleMetadata': False}

======================================================================
FAIL: test_loadBrukerXMLDataset (test_nmrTargetedDataset.test_nmrtargeteddataset_full_brukerxml_load) [Basic import BrukerQuant-UR with matching fileNamePattern]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_nmrTargetedDataset.py", line 1237, in test_loadBrukerXMLDataset
    pandas.testing.assert_frame_equal(
  File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/site-packages/pandas/_testing.py", line 1671, in assert_frame_equal
    assert_index_equal(
  File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/site-packages/pandas/_testing.py", line 825, in assert_index_equal
    _testing.assert_almost_equal(
  File "pandas/_libs/testing.pyx", line 46, in pandas._libs.testing.assert_almost_equal
  File "pandas/_libs/testing.pyx", line 161, in pandas._libs.testing.assert_almost_equal
  File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/site-packages/pandas/_testing.py", line 1073, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.columns are different

DataFrame.columns values are different (50.0 %)
[left]:  Index(['Feature Name', 'LLOQ', 'LOD', 'Lower Reference Percentile',
       'Lower Reference Value', 'ULOQ', 'Unit', 'Upper Reference Percentile',
       'Upper Reference Value', 'calibrationMethod', 'comment',
       'quantificationType'],
      dtype='object')
[right]: Index(['Feature Name', 'LLOQ', 'LOD', 'Lower Reference Percentile',
       'Lower Reference Value', 'Unit', 'Upper Reference Percentile',
       'Upper Reference Value', 'calibrationMethod', 'comment', 'lodMask',
       'quantificationType'],
      dtype='object')

======================================================================
FAIL: test_targeteddataset_validateObject (test_targeteddataset.test_targeteddataset_synthetic) [BasicTargetedDataset fails on empty TargetedDataset]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 367, in test_targeteddataset_validateObject
    self.assertEqual(badDataset.validateObject(verbose=False, raiseError=False, raiseWarning=False), {'Dataset': True, 'BasicTargetedDataset':False ,'QC':False, 'sampleMetadata':False})
AssertionError: {'Dataset': False, 'BasicTargetedDataset': False, 'QC[29 chars]alse} != {'Dataset': True, 'BasicTargetedDataset': False, 'QC'[28 chars]alse}
  {'BasicTargetedDataset': False,
-  'Dataset': False,
?             ^^^^

+  'Dataset': True,
?             ^^^

   'QC': False,
   'sampleMetadata': False}

======================================================================
FAIL: test_targeteddataset_validateObject (test_targeteddataset.test_targeteddataset_synthetic) [if self.VariableType is not an enum VariableType]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 489, in test_targeteddataset_validateObject
    self.assertEqual(badDataset.validateObject(verbose=False, raiseError=False, raiseWarning=False), {'Dataset': True, 'BasicTargetedDataset': False, 'QC': False, 'sampleMetadata': False})
AssertionError: {'Dataset': False, 'BasicTargetedDataset': False, 'QC[29 chars]alse} != {'Dataset': True, 'BasicTargetedDataset': False, 'QC'[28 chars]alse}
  {'BasicTargetedDataset': False,
-  'Dataset': False,
?             ^^^^

+  'Dataset': True,
?             ^^^

   'QC': False,
   'sampleMetadata': False}

----------------------------------------------------------------------
Ran 338 tests in 133.810s

FAILED (failures=6, errors=15, skipped=2)