torfsen / barely_json

A Python parser for data that only looks like JSON
MIT License
64 stars 7 forks source link

JSON with no delimiter #6

Open lrq3000 opened 7 years ago

lrq3000 commented 7 years ago

Hey there,

It's a great library you made :-) And it is a good idea to base it on a grammar processor instead of a simple homemade state machine.

I would like to make the library work on malformed JSON with absolutely no delimiter:

badjson = '''{"ghosting_calculation": {"process_types": ["T1"  "T2"  "FLAIR"  "SWI" ]    "mask_ap": "qc_templates/maskAP.nii.gz"  "scan_type_order": ["T1"  "FLAIR"  "T2" ]    "mask_side": "qc_templates/maskSide.nii.gz"}  "segmentation": {"scan_type_order": ["T1"  "FLAIR"  "T2" ]    "brain_mask": "qc_templates/wbrainmask.nii.gz"}  "dicom2nifti": {"process_types": ["T1"  "T2"  "FLAIR"  "SWI"  "DTI"  "b0"  "rs_fMRI" ]    "deface": true  "deface_types": ["T1"  "T2"  "FLAIR"  "SWI"  "T2*" ]    "overwrite": false}  "tbi_dti_qc": {"check_gradient_table": {"process_types": ["DTI" ]  }  "dti_tensor_residual_calculation": {"process_types": ["DTI" ]  }  "temporal_snr": {"process_types": ["DTI" ]    "brain_mask": "qc_templates/wbrainmask.nii.gz"}  "percentage_signal_change": {"process_types": ["DTI" ]    "brain_mask": "qc_templates/wbrainmask.nii.gz"  "mask_threshold": 0.9}  "input_data": {"process_types": ["DTI" ]    "data_types": ["NIFTI"  "BVAL"  "BVEC" ]  }  "fa_md_feature_estimation": {"process_types": ["DTI" ]  }}  "dicom_qc": {"process_types": ["T2"  "T1"  "FLAIR"  "DTI"  "SWI"  "rs_fMRI" ]    "center": "dicom_qc"  "server_name": "neuro-imaging.center-tbi.eu"}  "smoothed_image_statistics": {"process_types": ["T1"  "T2"  "FLAIR" ]  }  "cnr_calculation": {"process_types": ["T1"  "T2"  "FLAIR"  "DTI" ]    "noise_mask": "qc_templates/maskNoise.nii.gz"  "mask_threshold": 0.9}  "scan_type_translation": {"server_name": "neuro-imaging.center-tbi.eu"}  "input_data": {"proce ss_types": ["T1"  "T2"  "FLAIR"  "SWI"  "DTI"  "b0"  "rs_fMRI" ]    "data_types": ["DICOM"  "NIFTI"  "BVAL"  "BVEC" ]  }  "tbi_report": {}  "tbi_fmri_qc": {"fmri_motion_detection": {"process_types": [  "rs_fMRI" ]    "translation_threshold": 3  "rotation_threshold": 3}  "input_data": {"process_types": ["rs_fMRI" ]    "data_types": ["NIFTI" ]  }  "percentage_signal_change": {"process_types": ["rs_fMRI" ]    "brain_mask": "qc_templates/wbrainmask.nii.gz"  "mask_threshold": 0.9}  "temporal_snr": {"process_types": ["rs_fMRI" ]    "brain_mask": "qc_templates/wbrainmask.nii.gz"}  "fmri_snr_calculation": {"process_types": ["rs_fMRI" ]  }}  "head_coverage": {"process_types": ["T1"  "T2"  "FLAIR"  "SWI"  "DTI"  "b0"  "rs_FMRI" ]    "atlas_mask": "NormalSharpAtlas/mni_icbm152_nlin_asym_09a/mni_icbm152_t1_tal_nlin_asym_09a_mask.nii.gz"}  "registration": {"process_types": ["T1"  "T2"  "FLAIR"  "SWI"  "DTI"  "b0"  "rs_fMRI" ]    "scan_type_order": ["T1"  "FLAIR"  "T2" ]  }  "gm_wm_statistics": {"process_types": ["T1"  "T2"  "FLAIR" ]  }  "snr_calculation": {"process_types": ["T1"  "T2"  "FLAIR"  "DTI" ]    "noise_mask": "qc_templates/maskNoise.nii.gz"  "mask_types": {"SWI": "CSF"  "DTI": "CSF"  "T2": "CSF"  "T1": "WM"  "B0": "CSF"  "FLAIR": "GM"}  "mask_threshold": 0.9}  "upload_results": {}  "protocol_check": {"validate_order": true  "sequence": ["T2"  "T1"  "FLAIR"  "SWI"  "DTI"  "DTI"  "rs_fMRI" ]  }}'''

This peculiar bug might happen when the host software generating this json tries to export this json data as one field of a CSV file export with the default comma delimiter, in this case, all commas will inside the JSON field will be stripped out.

Do you have any idea how to fix that?

Thanks a lot!

torfsen commented 7 years ago

That is an interesting case, @lrq3000, and I'll think about adding support for it to barely_json. In the meantime you can add the missing commas as follows:

badjson2 = re.sub(r'(?<!:)\s+"', ', "', badjson)

(This adds a comma before each " that is preceded by at least one space but no colon)

barely_json then decodes it just fine:

{'cnr_calculation': {'mask_threshold': 0.9,
                     'noise_mask': 'qc_templates/maskNoise.nii.gz',
                     'process_types': ['T1', 'T2', 'FLAIR', 'DTI']},
 'dicom2nifti': {'deface': True,
                 'deface_types': ['T1', 'T2', 'FLAIR', 'SWI', 'T2*'],
                 'overwrite': False,
                 'process_types': ['T1',
                                   'T2',
                                   'FLAIR',
                                   'SWI',
                                   'DTI',
                                   'b0',
                                   'rs_fMRI']},
 'dicom_qc': {'center': 'dicom_qc',
              'process_types': ['T2', 'T1', 'FLAIR', 'DTI', 'SWI', 'rs_fMRI'],
              'server_name': 'neuro-imaging.center-tbi.eu'},
 'ghosting_calculation': {'mask_ap': 'qc_templates/maskAP.nii.gz',
                          'mask_side': 'qc_templates/maskSide.nii.gz',
                          'process_types': ['T1', 'T2', 'FLAIR', 'SWI'],
                          'scan_type_order': ['T1', 'FLAIR', 'T2']},
 'gm_wm_statistics': {'process_types': ['T1', 'T2', 'FLAIR']},
 'head_coverage': {'atlas_mask': 'NormalSharpAtlas/mni_icbm152_nlin_asym_09a/mni_icbm152_t1_tal_nlin_asym_09a_mask.nii.gz',
                   'process_types': ['T1',
                                     'T2',
                                     'FLAIR',
                                     'SWI',
                                     'DTI',
                                     'b0',
                                     'rs_FMRI']},
 'input_data': {'data_types': ['DICOM', 'NIFTI', 'BVAL', 'BVEC'],
                'proce ss_types': ['T1',
                                   'T2',
                                   'FLAIR',
                                   'SWI',
                                   'DTI',
                                   'b0',
                                   'rs_fMRI']},
 'protocol_check': {'sequence': ['T2',
                                 'T1',
                                 'FLAIR',
                                 'SWI',
                                 'DTI',
                                 'DTI',
                                 'rs_fMRI'],
                    'validate_order': True},
 'registration': {'process_types': ['T1',
                                    'T2',
                                    'FLAIR',
                                    'SWI',
                                    'DTI',
                                    'b0',
                                    'rs_fMRI'],
                  'scan_type_order': ['T1', 'FLAIR', 'T2']},
 'scan_type_translation': {'server_name': 'neuro-imaging.center-tbi.eu'},
 'segmentation': {'brain_mask': 'qc_templates/wbrainmask.nii.gz',
                  'scan_type_order': ['T1', 'FLAIR', 'T2']},
 'smoothed_image_statistics': {'process_types': ['T1', 'T2', 'FLAIR']},
 'snr_calculation': {'mask_threshold': 0.9,
                     'mask_types': {'B0': 'CSF',
                                    'DTI': 'CSF',
                                    'FLAIR': 'GM',
                                    'SWI': 'CSF',
                                    'T1': 'WM',
                                    'T2': 'CSF'},
                     'noise_mask': 'qc_templates/maskNoise.nii.gz',
                     'process_types': ['T1', 'T2', 'FLAIR', 'DTI']},
 'tbi_dti_qc': {'check_gradient_table': {'process_types': ['DTI']},
                'dti_tensor_residual_calculation': {'process_types': ['DTI']},
                'fa_md_feature_estimation': {'process_types': ['DTI']},
                'input_data': {'data_types': ['NIFTI', 'BVAL', 'BVEC'],
                               'process_types': ['DTI']},
                'percentage_signal_change': {'brain_mask': 'qc_templates/wbrainmask.nii.gz',
                                             'mask_threshold': 0.9,
                                             'process_types': ['DTI']},
                'temporal_snr': {'brain_mask': 'qc_templates/wbrainmask.nii.gz',
                                 'process_types': ['DTI']}},
 'tbi_fmri_qc': {'fmri_motion_detection': {'process_types': [u'', 'rs_fMRI'],
                                           'rotation_threshold': 3.0,
                                           'translation_threshold': 3.0},
                 'fmri_snr_calculation': {'process_types': ['rs_fMRI']},
                 'input_data': {'data_types': ['NIFTI'],
                                'process_types': ['rs_fMRI']},
                 'percentage_signal_change': {'brain_mask': 'qc_templates/wbrainmask.nii.gz',
                                              'mask_threshold': 0.9,
                                              'process_types': ['rs_fMRI']},
                 'temporal_snr': {'brain_mask': 'qc_templates/wbrainmask.nii.gz',
                                  'process_types': ['rs_fMRI']}},
 'tbi_report': {},
 'upload_results': {}}