Open Gscorreia89 opened 5 years ago
First part of this is being handled in this branch: https://github.com/phenomecentre/nPYc-Toolbox/tree/feature/abstractTargetedDataset
A new "abstract" class, AbstractTargetedDataset (inherits from Dataset, and which will be renamed in the future as Targeted Dataset once the refactor is complete) will be added first. The future classes NMRTargetedDataset and MSTargetedDatasets will inherit from it. The current TargetedDataset object will be kept throughout the process until the end to ensure we don't lose functionality during the refactor.
Hi @Gscorreia89 - I've fixed some failing unit tests in develop, and merged that into this branch, but there are still some failing unit tests. One of them in particular is a bit strange so I was wondering if you had any thoughts on it?
I see in the branch you have added these lines to Dataset.validateObject:
## self.VariableType is a enum VariableType
condition = isinstance(self.VariableType, VariableType)
success = 'Check self.VariableType is an enum \'VariableType\':\tOK'
failure = 'Check self.VariableType is an enum \'VariableType\':\tFailure, \'self.VariableType\' is' + str(
type(self.VariableType))
failureListBasic = conditionTest(condition, success, failure,
failureList, verbose, raiseError,
raiseWarning,
exception=TypeError(failure))
However there is a unit test (test_dataset.testValidateObject) which initialises an empty Dataset object and then runs the validation subtest. However, an empty Dataset object has self.VariableType = None in the constructor, so this code you added in this branch will always fail.
Do you have any thoughts on this?
Cheers!
Gordon
Here's another strange one:
def _loadBrukerNMRTargeted(self, datapath, unit=None, pdata=1, fileNamePattern=None, **kwargs):
"""
Import a dataset from Bruker IvDr .xml files.
:param datapath:
:param unit:
:param pdata:
:param fileNamePattern:
:param kwargs:
:return:
"""
if not isinstance(fileNamePattern, str):
raise TypeError('\'fileNamePattern\' must be a string')
The fileNamePattern is an optional argument, but if its not set it will throw an error?
Failing in test_NMRTargetedDataset.test_loadBrukerXMLDataset()
with self.subTest(msg='Basic import BrukerQuant-UR with implicit fileNamePattern from SOP'):
expected = copy.deepcopy(self.expectedQuantUR)
# Generate
result = nPYc.NMRTargetedDataset(self.datapathQuantUR, fileType='Bruker IvDR', sop='BrukerQuant-UR',
unit='mmol/mol Crea')
You can see its not passing the fileNamePattern, so its failing?
Commenting for reference:
======================================================================
ERROR: test_validateObject (test_dataset.test_dataset_synthetic) [self.VariableType does not exist]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_dataset.py", line 516, in test_validateObject
self.assertFalse(badDataset.validateObject(verbose=False, raiseError=False, raiseWarning=False))
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_dataset.py", line 556, in validateObject
condition = isinstance(self.VariableType, VariableType)
AttributeError: 'Dataset' object has no attribute 'VariableType'
======================================================================
ERROR: test_loadBrukerXMLDataset (test_nmrTargetedDataset.test_nmrtargeteddataset_full_brukerxml_load) [Basic import BrukerQuant-UR with implicit fileNamePattern from SOP]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_nmrTargetedDataset.py", line 1253, in test_loadBrukerXMLDataset
result = nPYc.NMRTargetedDataset(self.datapathQuantUR, fileType='Bruker IvDR', sop='BrukerQuant-UR',
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_nmrTargetedDataset.py", line 28, in __init__
self._loadBrukerNMRTargeted(datapath, sop=sop, **kwargs)
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_nmrTargetedDataset.py", line 140, in _loadBrukerNMRTargeted
raise TypeError('\'fileNamePattern\' must be a string')
TypeError: 'fileNamePattern' must be a string
======================================================================
ERROR: test_loadBrukerXMLDataset_warnDuplicates (test_nmrTargetedDataset.test_nmrtargeteddataset_full_brukerxml_load) [Import duplicated features (BI-LISA), Raises warning if features are duplicated]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_nmrTargetedDataset.py", line 1319, in test_loadBrukerXMLDataset_warnDuplicates
result.calibration['calibSampleMetadata'].drop(['Path'], axis=1, inplace=True)
AttributeError: 'NMRTargetedDataset' object has no attribute 'calibration'
======================================================================
ERROR: test_loadlims (test_nmrTargetedDataset.test_nmrtargeteddataset_full_brukerxml_load) [UnitTest1]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_nmrTargetedDataset.py", line 1406, in test_loadlims
dataset.intensityData = dataset.intensityData[sortIndex, :]
AttributeError: can't set attribute
======================================================================
ERROR: test_loadlims (test_nmrTargetedDataset.test_nmrtargeteddataset_full_brukerxml_load) [UnitTest3]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_nmrTargetedDataset.py", line 1435, in test_loadlims
dataset.intensityData = dataset.intensityData[sortIndex, :]
AttributeError: can't set attribute
======================================================================
ERROR: test_plotFeatureRanges (test_plotting.test_plotting)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/unittest/mock.py", line 1336, in patched
return func(*newargs, **newkeywargs)
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_plotting.py", line 411, in test_plotFeatureRanges
testData = nPYc.TargetedDataset(datapath, fileType='Bruker Quantification', sop='BrukerBI-LISA', fileNamePattern='.*?results\.xml$')
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
self._loadBrukerXMLDataset(datapath, **kwargs)
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
(self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)
======================================================================
ERROR: test_plotFeatureRanges_logscale (test_plotting.test_plotting)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/unittest/mock.py", line 1336, in patched
return func(*newargs, **newkeywargs)
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_plotting.py", line 442, in test_plotFeatureRanges_logscale
testData = nPYc.TargetedDataset(datapath, fileType='Bruker Quantification', sop='BrukerQuant-UR', fileNamePattern='.*?urine_quant_report_b\.xml$', unit='mmol/L')
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
self._loadBrukerXMLDataset(datapath, **kwargs)
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
(self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)
======================================================================
ERROR: test_loadBrukerXMLDataset (test_targeteddataset.test_targeteddataset_full_brukerxml_load) [Basic import BrukerQuant-UR with matching fileNamePattern]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 4835, in test_loadBrukerXMLDataset
result = nPYc.TargetedDataset(self.datapathQuantUR, fileType='Bruker Quantification', sop='BrukerQuant-UR', fileNamePattern='.*?urine_quant_report_b\.xml$', unit='mmol/mol Crea')
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
self._loadBrukerXMLDataset(datapath, **kwargs)
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
(self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)
======================================================================
ERROR: test_loadBrukerXMLDataset (test_targeteddataset.test_targeteddataset_full_brukerxml_load) [Basic import BrukerQuant-UR with implicit fileNamePattern from SOP]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 4866, in test_loadBrukerXMLDataset
result = nPYc.TargetedDataset(self.datapathQuantUR, fileType='Bruker Quantification', sop='BrukerQuant-UR', unit='mmol/mol Crea')
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
self._loadBrukerXMLDataset(datapath, **kwargs)
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
(self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)
======================================================================
ERROR: test_loadBrukerXMLDataset_warnDuplicates (test_targeteddataset.test_targeteddataset_full_brukerxml_load) [Import duplicated features (BI-LISA), Raises warning if features are duplicated]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 4907, in test_loadBrukerXMLDataset_warnDuplicates
result = nPYc.TargetedDataset(self.datapathBILISA, fileType='Bruker Quantification', sop='BrukerBI-LISA', fileNamePattern='.*?results\.xml$')
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
self._loadBrukerXMLDataset(datapath, **kwargs)
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
(self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)
======================================================================
ERROR: test_loadlims (test_targeteddataset.test_targeteddataset_full_brukerxml_load) [UnitTest1]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 4962, in test_loadlims
dataset = nPYc.TargetedDataset(self.datapathQuantUR, fileType='Bruker Quantification', sop='BrukerQuant-UR', fileNamePattern='.*?urine_quant_report_b\.xml$', unit='mmol/mol Crea')
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
self._loadBrukerXMLDataset(datapath, **kwargs)
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
(self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)
======================================================================
ERROR: test_loadlims (test_targeteddataset.test_targeteddataset_full_brukerxml_load) [UnitTest3]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 4989, in test_loadlims
dataset = nPYc.TargetedDataset(self.datapathBILISA, fileType='Bruker Quantification', sop='BrukerBI-LISA', fileNamePattern='.*?results\.xml$')
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 206, in __init__
self._loadBrukerXMLDataset(datapath, **kwargs)
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/objects/_targetedDataset.py", line 1098, in _loadBrukerXMLDataset
(self.intensityData, self.sampleMetadata, self.featureMetadata) = importBrukerXML(filelist)
ValueError: too many values to unpack (expected 3)
======================================================================
ERROR: test_utilities_importBrukerXML (test_utilities.test_utilities_read_bruker_xml)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_utilities.py", line 758, in test_utilities_importBrukerXML
(intensityData, sampleMetadata, featureMetadata) = importBrukerXML(paths)
ValueError: too many values to unpack (expected 3)
======================================================================
ERROR: test_utilities_importBrukerXML_fails (test_utilities.test_utilities_read_bruker_xml)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_utilities.py", line 781, in test_utilities_importBrukerXML_fails
(intensityData, sampleMetadata, featureMetadata) = importBrukerXML(paths)
ValueError: too many values to unpack (expected 3)
======================================================================
ERROR: test_utilities_readBrukerXML_warns (test_utilities.test_utilities_read_bruker_xml)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_utilities.py", line 718, in test_utilities_readBrukerXML_warns
self.assertWarnsRegex(UserWarning, 'Error parsing xml in .+?, skipping', importBrukerXML, [tmpfile])
File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/unittest/case.py", line 1301, in assertWarnsRegex
return context.handle('assertWarnsRegex', args, kwargs)
File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/unittest/case.py", line 201, in handle
callable_obj(*args, **kwargs)
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/../nPYc/utilities/_importBrukerXML.py", line 84, in importBrukerXML
return intensityData, sampleMetadata, featureMetadata, lodData
UnboundLocalError: local variable 'lodData' referenced before assignment
======================================================================
FAIL: test_validateObject (test_dataset.test_dataset_synthetic) [validateObject successful on empty Dataset]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_dataset.py", line 297, in test_validateObject
self.assertTrue(goodDataset.validateObject(verbose=False, raiseError=False, raiseWarning=True))
AssertionError: False is not true
======================================================================
FAIL: test_validateObject (test_msdataset.test_msdataset_synthetic) [BasicMSDataset fails on empty MSDataset]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_msdataset.py", line 818, in test_validateObject
self.assertEqual(badDataset.validateObject(verbose=False, raiseError=False, raiseWarning=False),
AssertionError: {'Dataset': False, 'BasicMSDataset': False, 'QC': Fal[23 chars]alse} != {'Dataset': True, 'BasicMSDataset': False, 'QC': Fals[22 chars]alse}
+ {'BasicMSDataset': False, 'Dataset': True, 'QC': False, 'sampleMetadata': False}
- {'BasicMSDataset': False,
- 'Dataset': False,
- 'QC': False,
- 'sampleMetadata': False}
======================================================================
FAIL: test_validateObject (test_msdataset.test_msdataset_synthetic) [if self.VariableType is not an enum VariableType]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_msdataset.py", line 1073, in test_validateObject
self.assertEqual(badDataset.validateObject(verbose=False, raiseError=False, raiseWarning=False), {'Dataset': True, 'BasicMSDataset': False, 'QC': False, 'sampleMetadata': False})
AssertionError: {'Dataset': False, 'BasicMSDataset': False, 'QC': Fal[23 chars]alse} != {'Dataset': True, 'BasicMSDataset': False, 'QC': Fals[22 chars]alse}
+ {'BasicMSDataset': False, 'Dataset': True, 'QC': False, 'sampleMetadata': False}
- {'BasicMSDataset': False,
- 'Dataset': False,
- 'QC': False,
- 'sampleMetadata': False}
======================================================================
FAIL: test_loadBrukerXMLDataset (test_nmrTargetedDataset.test_nmrtargeteddataset_full_brukerxml_load) [Basic import BrukerQuant-UR with matching fileNamePattern]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_nmrTargetedDataset.py", line 1237, in test_loadBrukerXMLDataset
pandas.testing.assert_frame_equal(
File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/site-packages/pandas/_testing.py", line 1671, in assert_frame_equal
assert_index_equal(
File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/site-packages/pandas/_testing.py", line 825, in assert_index_equal
_testing.assert_almost_equal(
File "pandas/_libs/testing.pyx", line 46, in pandas._libs.testing.assert_almost_equal
File "pandas/_libs/testing.pyx", line 161, in pandas._libs.testing.assert_almost_equal
File "/Users/ghaggart/opt/anaconda3/envs/npyc-develop/lib/python3.9/site-packages/pandas/_testing.py", line 1073, in raise_assert_detail
raise AssertionError(msg)
AssertionError: DataFrame.columns are different
DataFrame.columns values are different (50.0 %)
[left]: Index(['Feature Name', 'LLOQ', 'LOD', 'Lower Reference Percentile',
'Lower Reference Value', 'ULOQ', 'Unit', 'Upper Reference Percentile',
'Upper Reference Value', 'calibrationMethod', 'comment',
'quantificationType'],
dtype='object')
[right]: Index(['Feature Name', 'LLOQ', 'LOD', 'Lower Reference Percentile',
'Lower Reference Value', 'Unit', 'Upper Reference Percentile',
'Upper Reference Value', 'calibrationMethod', 'comment', 'lodMask',
'quantificationType'],
dtype='object')
======================================================================
FAIL: test_targeteddataset_validateObject (test_targeteddataset.test_targeteddataset_synthetic) [BasicTargetedDataset fails on empty TargetedDataset]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 367, in test_targeteddataset_validateObject
self.assertEqual(badDataset.validateObject(verbose=False, raiseError=False, raiseWarning=False), {'Dataset': True, 'BasicTargetedDataset':False ,'QC':False, 'sampleMetadata':False})
AssertionError: {'Dataset': False, 'BasicTargetedDataset': False, 'QC[29 chars]alse} != {'Dataset': True, 'BasicTargetedDataset': False, 'QC'[28 chars]alse}
{'BasicTargetedDataset': False,
- 'Dataset': False,
? ^^^^
+ 'Dataset': True,
? ^^^
'QC': False,
'sampleMetadata': False}
======================================================================
FAIL: test_targeteddataset_validateObject (test_targeteddataset.test_targeteddataset_synthetic) [if self.VariableType is not an enum VariableType]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ghaggart/workspace/nPYc-Toolbox/Tests/test_targeteddataset.py", line 489, in test_targeteddataset_validateObject
self.assertEqual(badDataset.validateObject(verbose=False, raiseError=False, raiseWarning=False), {'Dataset': True, 'BasicTargetedDataset': False, 'QC': False, 'sampleMetadata': False})
AssertionError: {'Dataset': False, 'BasicTargetedDataset': False, 'QC[29 chars]alse} != {'Dataset': True, 'BasicTargetedDataset': False, 'QC'[28 chars]alse}
{'BasicTargetedDataset': False,
- 'Dataset': False,
? ^^^^
+ 'Dataset': True,
? ^^^
'QC': False,
'sampleMetadata': False}
----------------------------------------------------------------------
Ran 338 tests in 133.810s
FAILED (failures=6, errors=15, skipped=2)
The features for import and QC of both targeted NMR (Bruker ivdr methods) and LC-QqQ MS assays are using the same general TargetedDataset. However, NMR Targeted methods are conceptually very simple, while LC-QqQ require a set of specific extra attributes. Maintaining both features in a single object is making modification of targeted QC features much harder to debug and improve, so these should be split to different specific TargetedDataset objects (which might or not inherit from an abstract Targeted Dataset object).