xtracthub / xtract-service

Globus Labs Xtract: Extract metadata from distributed data sets.
6 stars 1 forks source link

Validation metadata needs to be loaded with `json.loads()` twice #26

Closed rewong03 closed 3 years ago

rewong03 commented 4 years ago

Pulling metadata from a validation queue looks like:

"\"{\\\"files\\\": [{\\\"path\\\": \\\"/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz\\\", \\\"metadata\\\": {\\\"physical\\\": {\\\"size\\\": 1152, \\\"extension\\\": \\\"xyz\\\", \\\"path_type\\\": \\\"globus\\\"}}, \\\"base_url\\\": \\\"https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org\\\", \\\"file_id\\\": \\\"eec13110-8648-4775-8957-d1f36be9702f\\\"}], \\\"families\\\": [{\\\"family_id\\\": \\\"594c3ef3-3869-46a4-becd-ff308ba89919\\\", \\\"headers\\\": {\\\"Authorization\\\": \\\"Bearer Ag5Vynw7bvV7Dlb2vO7z418yJ5N5N8kJKOpvyz5znP1ry7l97nSnC5JXON7YQmjpqNjovrEokN3kGotKB8Ylnswq0p\\\", \\\"Transfer\\\": \\\"Aga01mYnynY7OnQ5jx5PK5P7jJw00O5Kb3WWk2G13WJYyxEKaGU9C3kkXY8KY62yW4wY3OzM9wQJrxHPye6Gvi6W9N\\\", \\\"FuncX\\\": \\\"Agj6xVPEqpzPm8j0QK6E23vXnYN9Nr8V3EpBgelKPv7XQV99lDI9CpQ6gweM6nOkwogGb11Xpn1wzjUpkbNXgua2z4\\\", \\\"Petrel\\\": \\\"Ag5Vynw7bvV7Dlb2vO7z418yJ5N5N8kJKOpvyz5znP1ry7l97nSnC5JXON7YQmjpqNjovrEokN3kGotKB8Ylnswq0p\\\"}, \\\"metadata\\\": {}, \\\"download_type\\\": null, \\\"files\\\": [{\\\"path\\\": \\\"/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz\\\", \\\"metadata\\\": {\\\"physical\\\": {\\\"size\\\": 1152, \\\"extension\\\": \\\"xyz\\\", \\\"path_type\\\": \\\"globus\\\"}}, \\\"base_url\\\": \\\"https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org\\\", \\\"file_id\\\": \\\"eec13110-8648-4775-8957-d1f36be9702f\\\"}], \\\"groups\\\": [{\\\"group_id\\\": \\\"fb48120b-10f8-48a8-b0fe-c10856807c30\\\", \\\"files\\\": [{\\\"path\\\": \\\"/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz\\\", \\\"metadata\\\": {\\\"physical\\\": {\\\"size\\\": 1152, \\\"extension\\\": \\\"xyz\\\", \\\"path_type\\\": \\\"globus\\\"}}, \\\"base_url\\\": \\\"https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org\\\", \\\"file_id\\\": \\\"eec13110-8648-4775-8957-d1f36be9702f\\\"}], \\\"parser\\\": \\\"ase\\\", \\\"metadata\\\": {\\\"matio\\\": [[[\\\"594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz\\\"], \\\"ase\\\", {\\\"cell\\\": {\\\"array\\\": [[20.0, 0.0, 0.0], [0.0, 20.0, 0.0], [0.0, 0.0, 20.0]], \\\"pbc\\\": [true, true, true], \\\"__ase_objtype__\\\": \\\"cell\\\"}, \\\"ctime\\\": 20.695254505437873, \\\"mtime\\\": 20.695254505437873, \\\"numbers\\\": [8, 1, 1], \\\"pbc\\\": [true, true, true], \\\"positions\\\": [[0.0, 0.0, 0.0], [0.93, 0.0, 0.0], [0.0, 0.95, 0.0]], \\\"unique_id\\\": \\\"69b9c8cc05f7c62297353f5e1046949b\\\", \\\"user\\\": null, \\\"chemical_formula\\\": \\\"H2O\\\"}]], \\\"parser\\\": \\\"ase\\\", \\\"debug\\\": [\\\"594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz\\\"], \\\"container_version\\\": \\\"15\\\", \\\"extract time\\\": 0.01892828941345215, \\\"extraction_time\\\": 0.018933773040771484}}, {\\\"group_id\\\": \\\"ea2cf26f-2e8c-4f74-88ce-d6bdb9179add\\\", \\\"files\\\": [{\\\"path\\\": \\\"/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz\\\", \\\"metadata\\\": {\\\"physical\\\": {\\\"size\\\": 1152, \\\"extension\\\": \\\"xyz\\\", \\\"path_type\\\": \\\"globus\\\"}}, \\\"base_url\\\": \\\"https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org\\\"}], \\\"parser\\\": \\\"ase\\\", \\\"metadata\\\": {\\\"matio\\\": [[[\\\"594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz\\\"], \\\"ase\\\", {\\\"cell\\\": {\\\"array\\\": [[20.0, 0.0, 0.0], [0.0, 20.0, 0.0], [0.0, 0.0, 20.0]], \\\"pbc\\\": [true, true, true], \\\"__ase_objtype__\\\": \\\"cell\\\"}, \\\"ctime\\\": 20.695254505995297, \\\"mtime\\\": 20.695254505995297, \\\"numbers\\\": [8, 1, 1], \\\"pbc\\\": [true, true, true], \\\"positions\\\": [[0.0, 0.0, 0.0], [0.93, 0.0, 0.0], [0.0, 0.95, 0.0]], \\\"unique_id\\\": \\\"6fb8114cdf7cc7bfa98446f53f124bd4\\\", \\\"user\\\": null, \\\"chemical_formula\\\": \\\"H2O\\\"}]], \\\"parser\\\": \\\"ase\\\", \\\"debug\\\": [\\\"594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz\\\"], \\\"container_version\\\": \\\"15\\\", \\\"extract time\\\": 0.017247438430786133, \\\"extraction_time\\\": 0.017252445220947266}}, {\\\"group_id\\\": \\\"840a96d7-5634-4bd5-806c-8795a7026a2a\\\", \\\"files\\\": [{\\\"path\\\": \\\"/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz\\\", \\\"metadata\\\": {\\\"physical\\\": {\\\"size\\\": 1152, \\\"extension\\\": \\\"xyz\\\", \\\"path_type\\\": \\\"globus\\\"}}, \\\"base_url\\\": \\\"https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org\\\"}], \\\"parser\\\": \\\"crystal\\\", \\\"metadata\\\": {\\\"matio\\\": [[[\\\"594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz\\\"], \\\"crystal\\\", {\\\"material\\\": {\\\"composition\\\": \\\"H2O1\\\"}, \\\"crystal_structure\\\": {\\\"space_group_number\\\": 6, \\\"number_of_atoms\\\": 3.0, \\\"volume\\\": 8000.0, \\\"stoichiometry\\\": \\\"AB2\\\"}}]], \\\"parser\\\": \\\"crystal\\\", \\\"debug\\\": [\\\"594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz\\\"], \\\"container_version\\\": \\\"15\\\", \\\"extract time\\\": 0.019762754440307617, \\\"extraction_time\\\": 0.01976776123046875}}, {\\\"group_id\\\": \\\"e6f98261-444a-4dba-8371-5693708c0f25\\\", \\\"files\\\": [{\\\"path\\\": \\\"/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz\\\", \\\"metadata\\\": {\\\"physical\\\": {\\\"size\\\": 1152, \\\"extension\\\": \\\"xyz\\\", \\\"path_type\\\": \\\"globus\\\"}}, \\\"base_url\\\": \\\"https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org\\\"}], \\\"parser\\\": \\\"csv\\\", \\\"metadata\\\": {\\\"matio\\\": {}, \\\"parser\\\": \\\"csv\\\", \\\"debug\\\": [\\\"594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz\\\"], \\\"error1\\\": \\\"THE LIST ERROR WAS HERE: Format \\\\\\\"xyz\\\\\\\" is not supported\\\", \\\"container_version\\\": \\\"15\\\", \\\"extract time\\\": 0.015398740768432617, \\\"extraction_time\\\": 0.015403985977172852}}, {\\\"group_id\\\": \\\"4053a040-a6c7-4278-b2a3-b81282580e92\\\", \\\"files\\\": [{\\\"path\\\": \\\"/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz\\\", \\\"metadata\\\": {\\\"physical\\\": {\\\"size\\\": 1152, \\\"extension\\\": \\\"xyz\\\", \\\"path_type\\\": \\\"globus\\\"}}, \\\"base_url\\\": \\\"https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org\\\"}], \\\"parser\\\": \\\"filename\\\", \\\"metadata\\\": {\\\"matio\\\": {}, \\\"parser\\\": \\\"filename\\\", \\\"debug\\\": [\\\"594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz\\\"], \\\"error1\\\": \\\"THE LIST ERROR WAS HERE: Mapping is required for the FilenameExtractor.\\\", \\\"container_version\\\": \\\"15\\\", \\\"extract time\\\": 0.016558408737182617, \\\"extraction_time\\\": 0.01656937599182129}}, {\\\"group_id\\\": \\\"0c11d7d3-7a70-423b-8010-af7672b77db7\\\", \\\"files\\\": [{\\\"path\\\": \\\"/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz\\\", \\\"metadata\\\": {\\\"physical\\\": {\\\"size\\\": 1152, \\\"extension\\\": \\\"xyz\\\", \\\"path_type\\\": \\\"globus\\\"}}, \\\"base_url\\\": \\\"https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org\\\"}], \\\"parser\\\": \\\"image\\\", \\\"metadata\\\": {\\\"matio\\\": {}, \\\"parser\\\": \\\"image\\\", \\\"debug\\\": [\\\"594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz\\\"], \\\"error1\\\": \\\"THE LIST ERROR WAS HERE: cannot identify image file '594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'\\\", \\\"container_version\\\": \\\"15\\\", \\\"extract time\\\": 0.015868186950683594, \\\"extraction_time\\\": 0.015874147415161133}}, {\\\"group_id\\\": \\\"73efdf01-6a93-4af6-bcd2-c866b6589df5\\\", \\\"files\\\": [{\\\"path\\\": \\\"/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz\\\", \\\"metadata\\\": {\\\"physical\\\": {\\\"size\\\": 1152, \\\"extension\\\": \\\"xyz\\\", \\\"path_type\\\": \\\"globus\\\"}}, \\\"base_url\\\": \\\"https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org\\\"}], \\\"parser\\\": \\\"json\\\", \\\"metadata\\\": {\\\"matio\\\": {}, \\\"parser\\\": \\\"json\\\", \\\"debug\\\": [\\\"594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz\\\"], \\\"error1\\\": \\\"THE LIST ERROR WAS HERE: Mapping is required for the JSONExtractor.\\\", \\\"container_version\\\": \\\"15\\\", \\\"extract time\\\": 0.014204978942871094, \\\"extraction_time\\\": 0.01421046257019043}}, {\\\"group_id\\\": \\\"26ccc3c6-d66f-45b9-8515-ec41fe6b105d\\\", \\\"files\\\": [{\\\"path\\\": \\\"/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz\\\", \\\"metadata\\\": {\\\"physical\\\": {\\\"size\\\": 1152, \\\"extension\\\": \\\"xyz\\\", \\\"path_type\\\": \\\"globus\\\"}}, \\\"base_url\\\": \\\"https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org\\\"}], \\\"parser\\\": \\\"xml\\\", \\\"metadata\\\": {\\\"matio\\\": {}, \\\"parser\\\": \\\"xml\\\", \\\"debug\\\": [\\\"594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz\\\"], \\\"error1\\\": \\\"THE LIST ERROR WAS HERE: Mapping is required for the XMLExtractor.\\\", \\\"container_version\\\": \\\"15\\\", \\\"extract time\\\": 0.01410365104675293, \\\"extraction_time\\\": 0.014109134674072266}}, {\\\"group_id\\\": \\\"9d3c9a63-ead3-414c-afc3-7b21e32cd995\\\", \\\"files\\\": [{\\\"path\\\": \\\"/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz\\\", \\\"metadata\\\": {\\\"physical\\\": {\\\"size\\\": 1152, \\\"extension\\\": \\\"xyz\\\", \\\"path_type\\\": \\\"globus\\\"}}, \\\"base_url\\\": \\\"https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org\\\"}], \\\"parser\\\": \\\"yaml\\\", \\\"metadata\\\": {\\\"matio\\\": {}, \\\"parser\\\": \\\"yaml\\\", \\\"debug\\\": [\\\"594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz\\\"], \\\"error1\\\": \\\"THE LIST ERROR WAS HERE: Mapping is required for the YAMLExtractor.\\\", \\\"container_version\\\": \\\"15\\\", \\\"extract time\\\": 0.014160871505737305, \\\"extraction_time\\\": 0.01416635513305664}}]}]}\""

It looks cleaner after loading is with the json library:

{'files': [{'path': '/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz', 'metadata': {'physical': {'size': 1152, 'extension': 'xyz', 'path_type': 'globus'}}, 'base_url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org', 'file_id': 'eec13110-8648-4775-8957-d1f36be9702f'}], 'families': [{'family_id': '594c3ef3-3869-46a4-becd-ff308ba89919', 'headers': {'Authorization': 'Bearer Ag5Vynw7bvV7Dlb2vO7z418yJ5N5N8kJKOpvyz5znP1ry7l97nSnC5JXON7YQmjpqNjovrEokN3kGotKB8Ylnswq0p', 'Transfer': 'Aga01mYnynY7OnQ5jx5PK5P7jJw00O5Kb3WWk2G13WJYyxEKaGU9C3kkXY8KY62yW4wY3OzM9wQJrxHPye6Gvi6W9N', 'FuncX': 'Agj6xVPEqpzPm8j0QK6E23vXnYN9Nr8V3EpBgelKPv7XQV99lDI9CpQ6gweM6nOkwogGb11Xpn1wzjUpkbNXgua2z4', 'Petrel': 'Ag5Vynw7bvV7Dlb2vO7z418yJ5N5N8kJKOpvyz5znP1ry7l97nSnC5JXON7YQmjpqNjovrEokN3kGotKB8Ylnswq0p'}, 'metadata': {}, 'download_type': None, 'files': [{'path': '/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz', 'metadata': {'physical': {'size': 1152, 'extension': 'xyz', 'path_type': 'globus'}}, 'base_url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org', 'file_id': 'eec13110-8648-4775-8957-d1f36be9702f'}], 'groups': [{'group_id': 'fb48120b-10f8-48a8-b0fe-c10856807c30', 'files': [{'path': '/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz', 'metadata': {'physical': {'size': 1152, 'extension': 'xyz', 'path_type': 'globus'}}, 'base_url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org', 'file_id': 'eec13110-8648-4775-8957-d1f36be9702f'}], 'parser': 'ase', 'metadata': {'matio': [[['594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'], 'ase', {'cell': {'array': [[20.0, 0.0, 0.0], [0.0, 20.0, 0.0], [0.0, 0.0, 20.0]], 'pbc': [True, True, True], '__ase_objtype__': 'cell'}, 'ctime': 20.695254505437873, 'mtime': 20.695254505437873, 'numbers': [8, 1, 1], 'pbc': [True, True, True], 'positions': [[0.0, 0.0, 0.0], [0.93, 0.0, 0.0], [0.0, 0.95, 0.0]], 'unique_id': '69b9c8cc05f7c62297353f5e1046949b', 'user': None, 'chemical_formula': 'H2O'}]], 'parser': 'ase', 'debug': ['594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'], 'container_version': '15', 'extract time': 0.01892828941345215, 'extraction_time': 0.018933773040771484}}, {'group_id': 'ea2cf26f-2e8c-4f74-88ce-d6bdb9179add', 'files': [{'path': '/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz', 'metadata': {'physical': {'size': 1152, 'extension': 'xyz', 'path_type': 'globus'}}, 'base_url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org'}], 'parser': 'ase', 'metadata': {'matio': [[['594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'], 'ase', {'cell': {'array': [[20.0, 0.0, 0.0], [0.0, 20.0, 0.0], [0.0, 0.0, 20.0]], 'pbc': [True, True, True], '__ase_objtype__': 'cell'}, 'ctime': 20.695254505995297, 'mtime': 20.695254505995297, 'numbers': [8, 1, 1], 'pbc': [True, True, True], 'positions': [[0.0, 0.0, 0.0], [0.93, 0.0, 0.0], [0.0, 0.95, 0.0]], 'unique_id': '6fb8114cdf7cc7bfa98446f53f124bd4', 'user': None, 'chemical_formula': 'H2O'}]], 'parser': 'ase', 'debug': ['594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'], 'container_version': '15', 'extract time': 0.017247438430786133, 'extraction_time': 0.017252445220947266}}, {'group_id': '840a96d7-5634-4bd5-806c-8795a7026a2a', 'files': [{'path': '/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz', 'metadata': {'physical': {'size': 1152, 'extension': 'xyz', 'path_type': 'globus'}}, 'base_url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org'}], 'parser': 'crystal', 'metadata': {'matio': [[['594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'], 'crystal', {'material': {'composition': 'H2O1'}, 'crystal_structure': {'space_group_number': 6, 'number_of_atoms': 3.0, 'volume': 8000.0, 'stoichiometry': 'AB2'}}]], 'parser': 'crystal', 'debug': ['594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'], 'container_version': '15', 'extract time': 0.019762754440307617, 'extraction_time': 0.01976776123046875}}, {'group_id': 'e6f98261-444a-4dba-8371-5693708c0f25', 'files': [{'path': '/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz', 'metadata': {'physical': {'size': 1152, 'extension': 'xyz', 'path_type': 'globus'}}, 'base_url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org'}], 'parser': 'csv', 'metadata': {'matio': {}, 'parser': 'csv', 'debug': ['594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'], 'error1': 'THE LIST ERROR WAS HERE: Format "xyz" is not supported', 'container_version': '15', 'extract time': 0.015398740768432617, 'extraction_time': 0.015403985977172852}}, {'group_id': '4053a040-a6c7-4278-b2a3-b81282580e92', 'files': [{'path': '/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz', 'metadata': {'physical': {'size': 1152, 'extension': 'xyz', 'path_type': 'globus'}}, 'base_url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org'}], 'parser': 'filename', 'metadata': {'matio': {}, 'parser': 'filename', 'debug': ['594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'], 'error1': 'THE LIST ERROR WAS HERE: Mapping is required for the FilenameExtractor.', 'container_version': '15', 'extract time': 0.016558408737182617, 'extraction_time': 0.01656937599182129}}, {'group_id': '0c11d7d3-7a70-423b-8010-af7672b77db7', 'files': [{'path': '/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz', 'metadata': {'physical': {'size': 1152, 'extension': 'xyz', 'path_type': 'globus'}}, 'base_url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org'}], 'parser': 'image', 'metadata': {'matio': {}, 'parser': 'image', 'debug': ['594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'], 'error1': "THE LIST ERROR WAS HERE: cannot identify image file '594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'", 'container_version': '15', 'extract time': 0.015868186950683594, 'extraction_time': 0.015874147415161133}}, {'group_id': '73efdf01-6a93-4af6-bcd2-c866b6589df5', 'files': [{'path': '/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz', 'metadata': {'physical': {'size': 1152, 'extension': 'xyz', 'path_type': 'globus'}}, 'base_url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org'}], 'parser': 'json', 'metadata': {'matio': {}, 'parser': 'json', 'debug': ['594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'], 'error1': 'THE LIST ERROR WAS HERE: Mapping is required for the JSONExtractor.', 'container_version': '15', 'extract time': 0.014204978942871094, 'extraction_time': 0.01421046257019043}}, {'group_id': '26ccc3c6-d66f-45b9-8515-ec41fe6b105d', 'files': [{'path': '/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz', 'metadata': {'physical': {'size': 1152, 'extension': 'xyz', 'path_type': 'globus'}}, 'base_url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org'}], 'parser': 'xml', 'metadata': {'matio': {}, 'parser': 'xml', 'debug': ['594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'], 'error1': 'THE LIST ERROR WAS HERE: Mapping is required for the XMLExtractor.', 'container_version': '15', 'extract time': 0.01410365104675293, 'extraction_time': 0.014109134674072266}}, {'group_id': '9d3c9a63-ead3-414c-afc3-7b21e32cd995', 'files': [{'path': '/MDF/mdf_connect/prod/data/h2o_13_v1-1/split_xyz_files/watergrid_60_HOH_180__0.7_rOH_1.8_vario_PBE0_AV5Z_delta_PS_data/watergrid_PBE0_record-3017.xyz', 'metadata': {'physical': {'size': 1152, 'extension': 'xyz', 'path_type': 'globus'}}, 'base_url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org'}], 'parser': 'yaml', 'metadata': {'matio': {}, 'parser': 'yaml', 'debug': ['594c3ef3-3869-46a4-becd-ff308ba89919/watergrid_PBE0_record-3017.xyz'], 'error1': 'THE LIST ERROR WAS HERE: Mapping is required for the YAMLExtractor.', 'container_version': '15', 'extract time': 0.014160871505737305, 'extraction_time': 0.01416635513305664}}]}]}

But it needs to be loaded twice to become a python dictionary

tskluzac commented 3 years ago

Fixed! You should be able to crawl and Xtract, and everything on the SQS queue should be valid json (only .dumps()ed once!)

rewong03 commented 3 years ago

@tskluzac Did the issue where matio metadata came out looking like a deformed list instead of a dictionary get fixed?

tskluzac commented 3 years ago

It should be. It seems to be passing the JSON linter I added along the way.