ikhvorost / M3U8Decoder

M3U8 playlist decoder for Swift.
MIT License
29 stars 11 forks source link

Custom attributes #1

Closed iDevelopper closed 3 months ago

iDevelopper commented 8 months ago

Thanks for this great decoder!

However, I'm looking for the best way to decode an m3u8 file like the one below, to recover all the attributes of EXTINF if they exist.

Thank you for your help.

#EXTM3U
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:171115544
#EXTINF:10,title="The Ben Maller Show",artist="zc4732",url="song_spot=\"T\" spotInstanceId=\"-1\" length=\"04:00:00\" MediaBaseId=\"\" TAID=\"0\" TPID=\"0\" cartcutId=\"\" amgArtworkURL=\"https://storage.googleapis.com/portal-content.zettacloud.appspot.com/shows/cff92185-5e92-11ec-9478-8bbc72f158cb/logo\" spEventID=\"01f47968-ccac-11ee-a9cf-f50937f44113\" "
https://n0ab-e2.revma.ihrhls.com/zc4732/10_sz0pexjzvq8g02/main/171115542.aac
#EXTINF:10,title="The Ben Maller Show",artist="zc4732",url="song_spot=\"T\" spotInstanceId=\"-1\" length=\"04:00:00\" MediaBaseId=\"\" TAID=\"0\" TPID=\"0\" cartcutId=\"\" amgArtworkURL=\"https://storage.googleapis.com/portal-content.zettacloud.appspot.com/shows/cff92185-5e92-11ec-9478-8bbc72f158cb/logo\" spEventID=\"01f47968-ccac-11ee-a9cf-f50937f44113\" "
https://n0ab-e2.revma.ihrhls.com/zc4732/10_sz0pexjzvq8g02/main/171115543.aac
#EXTINF:10,title="The Ben Maller Show",artist="zc4732",url="song_spot=\"T\" spotInstanceId=\"-1\" length=\"04:00:00\" MediaBaseId=\"\" TAID=\"0\" TPID=\"0\" cartcutId=\"\" amgArtworkURL=\"https://storage.googleapis.com/portal-content.zettacloud.appspot.com/shows/cff92185-5e92-11ec-9478-8bbc72f158cb/logo\" spEventID=\"01f47968-ccac-11ee-a9cf-f50937f44113\" "
https://n0ab-e2.revma.ihrhls.com/zc4732/10_sz0pexjzvq8g02/main/171115544.aac
ikhvorost commented 8 months ago

Hi @iDevelopper !

Yes, there is a bug with #EXTINF attribute e.g.:

struct EXTINF2: Decodable {
  public let duration: Double
  public let title: String
  public let artist: String
  public let url: String
}

struct Playlist: Decodable {
  let extinf: [EXTINF2]
}

let m3u8 = """
#EXTM3U
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:171115544
#EXTINF:10,title="The Ben Maller Show",artist="zc4732",url="song_spot=\"T\" spotInstanceId=\"-1\" length=\"04:00:00\" MediaBaseId=\"\" TAID=\"0\" TPID=\"0\" cartcutId=\"\" amgArtworkURL=\"https://storage.googleapis.com/portal-content.zettacloud.appspot.com/shows/cff92185-5e92-11ec-9478-8bbc72f158cb/logo\" spEventID=\"01f47968-ccac-11ee-a9cf-f50937f44113\" "
https://n0ab-e2.revma.ihrhls.com/zc4732/10_sz0pexjzvq8g02/main/171115542.aac
"""

let playlist = try! M3U8Decoder().decode(Playlist.self, from: m3u8)
print(playlist.extinf[0]) 

// Prints: EXTINF2(duration: 10.0, title: "The Ben Maller Show", artist: "zc4732", url: "song_spot=")

But url must be:

song_spot=\"T\" spotInstanceId=\"-1\" length=\"04:00:00\" MediaBaseId=\"\" TAID=\"0\" TPID=\"0\" cartcutId=\"\" amgArtworkURL=\"https://storage.googleapis.com/portal-content.zettacloud.appspot.com/shows/cff92185-5e92-11ec-9478-8bbc72f158cb/logo\" spEventID=\"01f47968-ccac-11ee-a9cf-f50937f44113\"

And then you are able to parse this string value manually because the library parses coma-separated attributes on the top level only.

Thanks for bug reporting.

iDevelopper commented 8 months ago

Okay, that's exactly what I tested and found. I'm not getting the full text for the url key.

Thank you for your response.

So are you going to fix the bug?

ikhvorost commented 8 months ago

Yes, the fix will be available in the next release version.

ikhvorost commented 8 months ago

Fixed in 1.2.0. Thanks.

iDevelopper commented 8 months ago

Hi @ikhvorost,

Awesome! Thanks!

Could you modify the regex attributes so that it also detects quotes (char code 39 "\'"), perhaps like this:

    private static let regexAttributes = try! NSRegularExpression(pattern: "([^=,\\s]+)=((\'([^\']+)\')|(\"([^\"]+)\")|([^,]+))")

Add it to the charSetQuotes:

 private static let charSetQuotes = CharacterSet(charactersIn: "\', \"")

Take this into account in the func convertType(text: String) -> Any

        guard text.hasPrefix("\'") == false, text.hasPrefix("\"") == false, text.hasPrefix("0x") == false, text.hasPrefix("0X") == false else {
            return text.trimmingCharacters(in: Self.charSetQuotes)
        }

As I receive the url with some quotes (') (instead of ""): // song_spot=\'T\' spotInstanceId=\'-1\' length=\'03:00:00\' MediaBaseId=\'\' TAID=\'0\' TPID=\'0\' cartcutId=\'\' amgArtworkURL=\'https://storage.googleapis.com/portal-content.zettacloud.appspot.com/shows/c2c3b528-59f9-11ec-b000-cb602abe1056/logo\' spEventID=\'c550c194-daff-11ee-acde-05a5249c06b1\'

And expose the function parse(attributes: String, keyValues: inout [String : Any]) to be public.

ikhvorost commented 3 months ago

Hi @iDevelopper !

There is new 2.0.0 version of the tool and this version has predefined segments property to deal with Media Segments now, e.g.:

let m3u8 = """
#EXTM3U
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:171115544
#EXTINF:10,title="The Ben Maller Show",artist="zc4732",url="song_spot=\"T\" spotInstanceId=\"-1\" length=\"04:00:00\" MediaBaseId=\"\" TAID=\"0\" TPID=\"0\" cartcutId=\"\" amgArtworkURL=\"https://storage.googleapis.com/portal-content.zettacloud.appspot.com/shows/cff92185-5e92-11ec-9478-8bbc72f158cb/logo\" spEventID=\"01f47968-ccac-11ee-a9cf-f50937f44113\" "
https://n0ab-e2.revma.ihrhls.com/zc4732/10_sz0pexjzvq8g02/main/171115542.aac
#EXTINF:10,title="The Ben Maller Show",artist="zc4732",url="song_spot=\"T\" spotInstanceId=\"-1\" length=\"04:00:00\" MediaBaseId=\"\" TAID=\"0\" TPID=\"0\" cartcutId=\"\" amgArtworkURL=\"https://storage.googleapis.com/portal-content.zettacloud.appspot.com/shows/cff92185-5e92-11ec-9478-8bbc72f158cb/logo\" spEventID=\"01f47968-ccac-11ee-a9cf-f50937f44113\" "
https://n0ab-e2.revma.ihrhls.com/zc4732/10_sz0pexjzvq8g02/main/171115543.aac
#EXTINF:10,title="The Ben Maller Show",artist="zc4732",url="song_spot=\"T\" spotInstanceId=\"-1\" length=\"04:00:00\" MediaBaseId=\"\" TAID=\"0\" TPID=\"0\" cartcutId=\"\" amgArtworkURL=\"https://storage.googleapis.com/portal-content.zettacloud.appspot.com/shows/cff92185-5e92-11ec-9478-8bbc72f158cb/logo\" spEventID=\"01f47968-ccac-11ee-a9cf-f50937f44113\" "
https://n0ab-e2.revma.ihrhls.com/zc4732/10_sz0pexjzvq8g02/main/171115544.aac
"""

struct CustomExtInf: Decodable {
  let duration: Double
  let title: String
  let artist: String
  let url: String
  let spotinstanceid: String
  let length: String
  let mediabaseid: String
  let taid: String
  let tpid: String
  let cartcutid: String
  let amgartworkurl: String
  let speventid: String
}

struct CustomMediaSegment: Decodable {
  let extinf: CustomExtInf
  let uri: String
}

struct MediaPlaylist: Decodable {
  let segments: [CustomMediaSegment]
}

let playlist = try M3U8Decoder().decode(MediaPlaylist.self, from: m3u8)

print(playlist.segments[0].extinf)

Prints:

CustomExtInf(
  duration: 10.0,
  title: "The Ben Maller Show",
  artist: "zc4732",
  url: "song_spot=",
  spotinstanceid: "-1",
  length: "04:00:00",
  mediabaseid: "",
  taid: "0",
  tpid: "0",
  cartcutid: "",
  amgartworkurl: "https://storage.googleapis.com/portal-content.zettacloud.appspot.com/shows/cff92185-5e92-11ec-9478-8bbc72f158cb/logo",
  speventid: "01f47968-ccac-11ee-a9cf-f50937f44113"
)

As you can see url and song_spot still are parsed incorrectly because the attributes has complex nested structure. But for those cases you can use new feature parseHandler for custom parsing, e.g.:

let decoder = M3U8Decoder()
decoder.parseHandler = { (tag: String, attributes: String) -> M3U8Decoder.ParseAction in
  if tag == "EXTINF" {
    // Custom parsing
    var keyValues = [String : Any]()
    keyValues["title"] = "The Ben Maller Show"
    keyValues["duration"] = 10
    //...

    return .apply(keyValues)
  }
  return .parse
}

let playlist = try decoder.decode(MediaPlaylist.self, from: m3u8)
iDevelopper commented 3 months ago

Hi @ikhvorost !

Thank you very much for this update! I managed to modify my code and everything works perfectly. It's great!

I hope you are feeling better...