ncihtan / data-models

Schema.org Data Models for HTAN
MIT License
14 stars 7 forks source link

Fix filename regex #328

Open adamjtaylor opened 8 months ago

adamjtaylor commented 8 months ago

Per FDS-1416 we were getting a DCA crash due to an invalid regex for the attribute Filename.

Investigation revealed that the baskslash escape of the forward slash was not needed.

adamjtaylor commented 8 months ago

Reopening as I'm still seeing an error. There seems to be entanglement between the regex and JSON escaping, and Milen mentioned that there may be some legacy escaping functionality within schematic itself.

Intend to test behaviour in the refactor.

adamjtaylor commented 7 months ago

Re-opening as the PR #333 was a temp fix.

aclayton555 commented 6 months ago

As of 23-12 close out, Adam cannot get this to work. Seem to be multiple limitations at different staging (possible bugs re: escape characters).

Mitigation:

Approach:

Come back to after DR5. Another question is which file name platforms we need to support, Synapse, AWS, GCP, DRS, generic S3 protocols, SD, and CDS

adamjtaylor commented 6 months ago

Email from Amanda: File Naming Conventions “All projects are recommended to follow the guidelines provided in the link: https://docs.sevenbridges.com/docs/run-a-task#select-input-files File naming convention: allowed characters in input file names are a-z, A-Z, 0-9, dash (- ), period (. ), semicolon (; ), tilde (~) and hash (#). If there are input files containing other characters in their file names, task execution will not be started”

adamjtaylor commented 6 months ago

For Synapse

This looks like the entity name regex, not sure where it is in the docs. I think there's a nice error response (that includes the allowable characters) if you have an invalid character in your entity name

ModelConstants.java

    public static final String VALID_ENTITY_NAME_REGEX = "^[a-zA-Z0-9,_. \\-+()']+";
adamjtaylor commented 6 months ago

For AWS S3

The following character sets are generally safe for use in key names.

Alphanumeric characters 0-9

a-z

A-Z

Special characters
Exclamation point (!)

Hyphen (-)

Underscore (_)

Period (.)

Asterisk (*)

Single quote (')

Open parenthesis (()

adamjtaylor commented 6 months ago

Intersection appears to be only

Alphabets (a-z, A-Z) Numbers (0-9) Special characters: dash (-), period (.), underscore (_)

SB confirmed that underscore is permissible

aclayton555 commented 5 months ago

Through HTAN phase 1, take the approach of fixing any illegal file names. names (Clarisse is doing this already). Pick this back up as part of ID/file naming guidance in renewal.