rails / marcel

Find the mime type of files, examining file, filename and declared type
Apache License 2.0
387 stars 67 forks source link

Incorrect HTML magic identification when preceeded by a comment #102

Open markedmondson opened 7 months ago

markedmondson commented 7 months ago

If the HTML has a comment before the opening tag, it is incorrectly identified as XML.

Steps to reproduce

io = StringIO.new(<<~HTML)
  <!--/* Throwaway comment but it has to be over 64 characters to fail AND have a uppercase HTML tag */-->
  <HTML>
    <head>
    </head>
    <body>
      <h1>Magic!</h1>
    </body
  </HTML>
HTML
Marcel::MimeType.for(io)
# => "application/xml"
io = StringIO.new(<<~HTML)
  <!--/* Throwaway comment but it has to be over 128 characters to fail AND have a lowercase HTML tag, we can pad this one out a bit to get it longer */-->
  <html>
    <head>
    </head>
    <body>
      <h1>Magic!</h1>
    </body
  </html>
HTML
Marcel::MimeType.for(io)
# => "application/xml"

Updating the magic definitions is a temporary workaround but obviously the comment could be any length, the broader lookup here https://github.com/rails/marcel/blob/main/lib/marcel/tables.rb#L2761 falls below the comment xml matching magic in https://github.com/rails/marcel/blob/main/lib/marcel/tables.rb#L2747.

Temporary workaround

Marcel::MimeType.extend "text/html", magic: [[0..256, "<HTML"]]
Marcel::MimeType.extend "text/html", magic: [[0..256, "<html"]]