hapijs / subtext

HTTP payload parser
Other
24 stars 25 forks source link

Support for parsing `Content-Type` with non-UTF-8 `charset` directive #95

Open chalkpe opened 2 years ago

chalkpe commented 2 years ago

Support plan

Context

What problem are you trying to solve?

Parser doesn't do anything with Content-Type header with charset directive. It only parses payload buffer by utf8, hardcoded. Only way to process non-UTF-8 payload is disable internal parser via route.options.payload.parse: false and decode raw buffer manually, losing all benefits from hapi framework.

import { Server } from '@hapi/hapi'
import { encode } from 'iconv-lite'

async function test(text, charset = 'euc-kr') {
  const server = new Server({})
  server.route({
    method: 'POST',
    path: '/',
    options: { handler: (req) => req.payload }
  })

  await server.start()
  const { payload } = await server.inject({
    method: 'POST',
    url: '/',
    payload: encode(text, charset),
    headers: { 'content-type': `application/json; charset=${charset}` }
  })

  await server.stop()
  return { expected: text, got: payload }
}

test('{"한글":"인코딩"}').then(console.log).catch(console.error)
// { expected: '{"한글":"인코딩"}', got '{"�ѱ�":"���ڵ�"}' }

Do you have a new or modified API suggestion to solve the problem?

kanongil commented 2 years ago

This sounds like a very sensible request, especially since you sometime can't control what charsets are used to upload text.

I would probably limit it to 'utf8' by default, but allow other charsets through a boolean option, and possibly an allow list.

FYI, this will also require @hapi/content to be updated, to actually parse and return the charset parameter.

I don't know if there is anyone that are available to implement such a feature. But I would be happy to review a PR from you.