Closed wkiri closed 2 years ago
This error was rather interesting to track down:
FAIL: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/sentences.xml Begin Content Validation: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/sentences.csv ERROR [error.validation.invalid_field_value] table 1, record 1884, field 3: The field value 'For example, Rayleigh fractional crystallization of Adirondack magma steadily increases incompatible element concentrations (K; ! D Kbulk " 0) and rapidly decreases compatible element concentrations (Ni; ! D Ni bulk >>1).' that starts with double quote should not contain double quote(s)
Here we have a double-quote that appears by itself, so it is not caught by this regular expression, which assumes balanced quoting: https://github.com/wkiri/MTE/blob/c59b66a3895dd96afd1218f4d79c61e345d59719/src/deliver_sqlite.py#L154
I think we should just replace all double quotes, not just balanced ones. In that case, we can use sentences_df.replace(regex='"', value="''", inplace=True)
prior to writing out the CSV. This also avoids having to read it back in and fix in replace_internal_double_quote()
.
One outcome is that sentences (fields) which previously were enclosed in double quotes because they had a double-quote internally now are not quoted (because they only have single quotes). I think this is fine, but wanted to hear from @stevenlujpl .
Another outcome is that this function which relies on converting the CSV lines into a numpy array no longer behaves as expected, because numpy's array conversion isn't really smart enough to parse CSV content: https://github.com/wkiri/MTE/blob/c59b66a3895dd96afd1218f4d79c61e345d59719/src/generate_pds4_bundle.py#L182-L191
However, by using pandas, we can simplify this to:
# Compute maximum_field_length for all columns in the csv lines.
def get_max_field_len(csv_lines, field_index):
# content_2d_array = np.array([line.split(',') for line in csv_lines])
content_df = pd.DataFrame(csv_lines)
if len(content_df) == 1:
max_len = 0
else:
max_len = content_df[field_index][1:].astype(bytes).str.len().max()
return max_len
@stevenlujpl Please browse the relevant commits (in branch issue41-validate
). If you agree with this change, I think we can also remove the function replace_internal_double_quote()
.
I meant to add, with this change the bundle no longer generates the error with validate 2.1.4.
Resolved by using validate 2.1.4:
WARNING [warning.integrity.unreferenced_member] Identifier 'urn:nasa:pds:mars_target_encyclopedia:data_mer2:aliases::1.0' is not a member of any collection within the given target
This (and similar) warnings was caused because it turns out that the collection inventory and associated .xml need to appear at the collection level (inside the data_mer
directory) rather than nested in the data_mer/mer2
subdirectory. This is a little confusing and I am not completely sure it will work to have two collection .xmls inside data_mer
(for mer1
and for mer2
). If not, we can rearrange our directory structure to have top-level data_mer1
and data_mer2
.
For now, moving the mer2
collection inventory files up to data_mer
resolves all remaining validate issues.
Validate output (warnings about the manifest and md5 files are expected):
PDS Validate Tool Report
Configuration:
Version 2.1.4
Date 2022-02-01T01:16:28Z
Parameters:
Targets [file:/home/wkiri/Research/MTE/git/pds4_bundle/bundle_v1.3.1/mars_target_e\
ncyclopedia/]
Rule Type pds4.bundle
Severity Level WARNING
Recurse Directories true
File Filters Used [*.xml, *.XML]
Data Content Validation on
Product Level Validation on
Allow Unlabeled Files false
Max Errors 100000
Registered Contexts File /proj/mte/pds4_validation_tool/v2.1.4/resources/registered_context_product\
s.json
[...]
PASS: file:/home/wkiri/Research/MTE/git/pds4_bundle/bundle_v1.3.1/mars_target_encyclopedia/urn-nasa-pds-m\
ars_target_encyclopedia.manifest
WARNING [warning.file.not_referenced_in_label] File is not referenced by any label
5 integrity check(s) completed
PASS: file:/home/wkiri/Research/MTE/git/pds4_bundle/bundle_v1.3.1/mars_target_encyclopedia/urn-nasa-pds-m\
ars_target_encyclopedia.md5
WARNING [warning.file.not_referenced_in_label] File is not referenced by any label
6 integrity check(s) completed
[...]
Summary:
0 error(s)
2 warning(s)
Product Validation Summary:
33 product(s) passed
0 product(s) failed
0 product(s) skipped
Referential Integrity Check Summary:
35 check(s) passed
0 check(s) failed
0 check(s) skipped
Message Types:
2 warning.file.not_referenced_in_label
End of Report
Completed execution in 7826 ms
Note: Remember to emphasize the benefits of PDS using standard CSV handling of quotes instead of custom format when we reply with the updated bundle.
Scott VanBommel used version 2.1.0 of the validate tool and identified some issues that need investigation. The full output is included below.
PDS Validate Tool Report
Configuration: Version 2.1.0 Date 2022-01-27T13:32:31Z
Parameters: Targets [file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/] Rule Type pds4.bundle Severity Level WARNING Recurse Directories true File Filters Used [.xml, .XML] Data Content Validation on Product Level Validation on Allow Unlabeled Files false Max Errors 100000 Registered Contexts File C:\PDS\Tools\Validate\bin..\resources\registered_context_products.json
Product Level Validation Results
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/bundle_mars_target_encyclopedia.xml 1 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/aliases.xml 2 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/collection_mer2_inventory.xml 3 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/components.xml 4 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/contains.xml 5 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/documents.xml 6 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/has_property.xml 7 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/mentions.xml 8 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/properties.xml 9 product validation(s) completed
FAIL: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/sentences.xml Begin Content Validation: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/sentences.csv
[x] ERROR [error.validation.invalid_field_value] table 1, record 1884, field 3: The field value 'For example, Rayleigh fractional crystallization of Adirondack magma steadily increases incompatible element concentrations (K; ! D Kbulk " 0) and rapidly decreases compatible element concentrations (Ni; ! D Ni bulk >>1).' that starts with double quote should not contain double quote(s) End Content Validation: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/sentences.csv 10 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/targets.xml 11 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/aliases.xml 12 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/collection_mpf_inventory.xml 13 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/components.xml 14 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/contains.xml 15 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/documents.xml 16 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/has_property.xml 17 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/mentions.xml 18 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/properties.xml 19 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/sentences.xml 20 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/targets.xml 21 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/aliases.xml 22 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/collection_phx_inventory.xml 23 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/components.xml 24 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/contains.xml 25 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/documents.xml 26 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/has_property.xml 27 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/mentions.xml 28 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/properties.xml 29 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/sentences.xml 30 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/targets.xml 31 product validation(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/document/collection_document_inventory.xml 32 product validation(s) completed
FAIL: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/document/readme.xml
PDS4 Bundle Level Validation Results
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/collection_mpf_inventory.xml 1 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/collection_phx_inventory.xml 2 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/document/collection_document_inventory.xml 3 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/bundle_mars_target_encyclopedia.xml
[x] WARNING [warning.integrity.missing_context_reference] The context reference 'urn:nasa:pds:context:instrument:chemcam_libs.msl' could not be found in this bundle but it was defined in urn:nasa:pds:mars_target_encyclopedia:document::1.3. (Disable with --skip-context-reference-check flag) 4 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/components.xml 5 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/sentences.xml 6 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/aliases.xml
[x] WARNING [warning.integrity.unreferenced_member] Identifier 'urn:nasa:pds:mars_target_encyclopedia:data_mer2:aliases::1.0' is not a member of any collection within the given target 7 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/has_property.xml 8 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/components.xml 9 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/contains.xml 10 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/collection_mer2_inventory.xml 11 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/aliases.xml 12 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/contains.xml 13 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/sentences.xml
[x] WARNING [warning.integrity.unreferenced_member] Identifier 'urn:nasa:pds:mars_target_encyclopedia:data_mer2:sentences::1.0' is not a member of any collection within the given target 14 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/documents.xml
[x] WARNING [warning.integrity.unreferenced_member] Identifier 'urn:nasa:pds:mars_target_encyclopedia:data_mer2:documents::1.0' is not a member of any collection within the given target 15 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/mentions.xml 16 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/sentences.xml 17 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/components.xml
[x] WARNING [warning.integrity.unreferenced_member] Identifier 'urn:nasa:pds:mars_target_encyclopedia:data_mer2:components::1.0' is not a member of any collection within the given target 18 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/targets.xml 19 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/targets.xml 20 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/properties.xml 21 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/contains.xml
[x] WARNING [warning.integrity.unreferenced_member] Identifier 'urn:nasa:pds:mars_target_encyclopedia:data_mer2:contains::1.0' is not a member of any collection within the given target 22 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/has_property.xml
[x] WARNING [warning.integrity.unreferenced_member] Identifier 'urn:nasa:pds:mars_target_encyclopedia:data_mer2:has_property::1.0' is not a member of any collection within the given target 23 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/mentions.xml
[x] WARNING [warning.integrity.unreferenced_member] Identifier 'urn:nasa:pds:mars_target_encyclopedia:data_mer2:mentions::1.0' is not a member of any collection within the given target 24 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/properties.xml 25 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/mentions.xml 26 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/documents.xml 27 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/properties.xml
[x] WARNING [warning.integrity.unreferenced_member] Identifier 'urn:nasa:pds:mars_target_encyclopedia:data_mer2:properties::1.0' is not a member of any collection within the given target 28 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/documents.xml 29 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_phx/has_property.xml 30 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/document/readme.xml 31 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mer/mer2/targets.xml
[x] WARNING [warning.integrity.unreferenced_member] Identifier 'urn:nasa:pds:mars_target_encyclopedia:data_mer2:targets::1.0' is not a member of any collection within the given target 32 integrity check(s) completed
PASS: file:/C:/Users/vanbommel/Desktop/mars_target_encyclopedia/data_mpf/aliases.xml 33 integrity check(s) completed
Summary:
2 error(s) 12 warning(s)
Product Validation Summary: 31 product(s) passed 2 product(s) failed 0 product(s) skipped
Referential Integrity Check Summary: 33 check(s) passed 0 check(s) failed 0 check(s) skipped
Message Types: 1 error.validation.internal_error 1 error.validation.invalid_field_value 9 warning.integrity.unreferenced_member 3 warning.integrity.missing_context_reference
End of Report