wolfgangw / backports

Deep inspection of digital cinema packages
27 stars 12 forks source link

Special character in file name (subtitles) #74

Open liloneum opened 6 years ago

liloneum commented 6 years ago

A special character in a subtitle name (ç in my case) will cause a fatal error of ingest with Doremi servers without mentioning the error.

wolfgangw commented 11 months ago

@liloneum Thanks, noted - shall be fixed.

matmat commented 10 months ago

As for special characters in filenames in general. I put some notes here: https://dcpomatic.com/mantis/view.php?id=2465

Repeating below:

The Interop (UDF) constraints are a bit messy, I think it would be easier to just enforce the SMPTE rules also for interop. Mainly:

References:

SMPTE:

ST 429-9:2014

7.1 Path

The Path element indicates the complete path for the Chunk, represented as a URI per [RFC 3986]. Its semantics and format are delivery-medium dependent, and constrained by each Map Profile (see Section 9). The value is encoded as an xs:anyURI. Note: Annex A presents a basic Map Profile.

Annex A Basic Map Profile v2 (Normative)

A.2 Path

Each Path element value shall be a relative-path reference as specified in RFC 3986. No query or fragment component shall be present. Given a Path element in an Asset Map, the relative-path reference shall be resolved, as specified in RFC 3986, relative to a Base URI consisting of the location of the Asset Map. (...) Each path segment, as specified in IETF RFC 3986, shall consist of characters from the set a-z, A-Z, 0-9, “-“ (dash), “_” (underscore) and “.” (period). No segment shall have more than 100 characters, and the value of the Path element shall not exceed 100 characters in length. A Path element value shall have no more than 10 segments. The Path element value shall preserve case (the path and the filename on the filesystem shall have identical case). No two paths in an Asset Map shall have identical value, regardless of case.

INTEROP:

https://interop-docs.cinepedia.com/Document_Release_2.0/mpeg_ii_am_spec.pdf

6.4 Chunk Path Format

The path and filename shall conform to the UDF specification.

http://www.osta.org/specs/pdf/udf201.pdf

Basic Restrictions & Requirements

File Name Length: Maximum of 255 bytes

4.2.2.1 char FileIdentifier

... [this section with subsections contain quite involved algorithms for translation of "illegal" names to be used on specific OSes]

wolfgangw commented 10 months ago

@matmat thanks for the notes and reminders!

Added checks for outsider chars in AM asset paths. Depending on AM type (SMPTE/Interop) the return will be Error (SMPTE) and Hint (Interop), respectively. What do you think? (4c3977cc2b14d55a024a375f133dbde347ede8eb)

Also added a length check for AM asset paths that should have been in there 10 years ago -_-

matmat commented 10 months ago

Looks good, thank you!

In practice I think lots of DCPs will fail this but still play back without probles (in most cases).. But that's how it is I guess.

The festival is coming up and I will battle test this in the coming weeks! :)

If/when you have time these additional checks would be nice to have (but some of them maybe unneccecary..):