nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.53k stars 29.03k forks source link

add `path.mimeType()` and `path.charset()` to `node:path` #54595

Open aralroca opened 2 weeks ago

aralroca commented 2 weeks ago

What is the problem this feature will solve?

Adding path.mimeType() and path.charset will solve the dependency on an external library, mime-types, to get the content type of a file based on its extension. This can be done with Node.js itself, just as Bun.js does with Bun.file(file).type.

What is the feature you are proposing to solve the problem?

I am proposing to add two new methods to the path module:

path.mimeType('file.js'); // returns 'application/javascript'
// or:
path.mimeType(path.extname('file.js')); // returns 'application/javascript'

And:

path.charset('file.js'); // returns 'utf-8'
// or:
path.charset(path.mimeType('file.js')); // returns 'utf-8'

What alternatives have you considered?

Alternatively, instead of being in node:path, it could be in fs.stat, but I think that in the end it can be extracted from the path without having to parse the file, besides it is better to use functions for this and the work is done only if they are executed individually.

CC: @vdeturckheim

jasnell commented 2 weeks ago

For path.mimeType(...) a couple of questions...

  1. Would you expect the actual path to be resolved and for the mime type to be sniffed from the contents or based entirely on the file extension?
  2. Which master list of mime types to file extensions do you propose we use and what is the source of that? How is it maintained.

For path.charset('...') ... the only way to accomplish this is to actually try reading the file. I don't believe that belongs in the path module but possibly in fs. Even then it can be a bit problematic to reliably determine the encoding of a file. Would it be based entirely on the presence of a BOM? Would it be limited to just the charset encodings we already know of (e.g. utf8, utf16le, etc)? Given cases where the charset is ambiguous (e.g. utf8 without a bom that only contains ascii characters could be utf8, ascii, or latin1), what would be the expected result?

RedYetiDev commented 2 weeks ago

Personally, I'm -1 on adding this to core.

Mimetypes can be easily done in userland using a Map<extension, mime-type>

Charsets require reading the file, which is not something the path module does, and I don't think it is something the path module should.

aralroca commented 2 weeks ago
  1. Would you expect the actual path to be resolved and for the mime type to be sniffed from the contents or based entirely on the file extension?
  2. Which master list of mime types to file extensions do you propose we use and what is the source of that? How is it maintained.

For path.charset('...') ... the only way to accomplish this is to actually try reading the file. I don't believe that belongs in the path module but possibly in fs. Even then it can be a bit problematic to reliably determine the encoding of a file. Would it be based entirely on the presence of a BOM? Would it be limited to just the charset encodings we already know of (e.g. utf8, utf16le, etc)? Given cases where the charset is ambiguous (e.g. utf8 without a bom that only contains ascii characters could be utf8, ascii, or latin1), what would be the expected result?

@jasnell It is not clear to me that it is necessary to look at the contents of the file. The chatset can be extracted from the mimeType, and the mimeType from the file extension. Nowadays the libraries that support this make use of this JSON reference: https://github.com/jshttp/mime-db/blob/master/db.json

Personally, I'm -1 on adding this to core.

Mimetypes can be easily done in userland using a Map<extension, mime-type>

Charsets require reading the file, which is not something the path module does, and I don't think it is something the path module should.

@RedYetiDev as I commented the current libraries to do this extract the charset depending on the mimeType without looking the internal content. This is very useful for frameworks to serve dynamic assets and put the Content-Type in a fast way. If the file has to be read it would lose efficiency if you want to serve the file in streaming.

And about the mapping, yes, can be done via a Map, but this Map is huge, and it doesn't make sense to depend on a library to maintain this JSON, it is better that it is part of the runtime itself IMO, besides that in the end it is adding dependencies for basic things.

RedYetiDev commented 2 weeks ago

(CC @nodejs/path)


(While I am not a core collaborator) I still strongly disagree with the addition of this API into the core of Node.js. While I do greatly appreciate this issue being opened, I have some major concerns with this API. Here are a few of them:

1. Maintainability

If Node.js were to implement this feature, the project would need to take on the significant responsibility of maintaining a constantly evolving and extensive map from file extensions to character sets and MIME types. Additionally, the project would have to handle edge cases, conflicts, and potential ambiguities that may arise due to overlapping file extensions or varying interpretations of MIME type specifications across different platforms and software.

2. Redundancy with Existing Ecosystem

The Node.js ecosystem already includes well-established modules like mime and mime-types that effectively manage MIME type and charset detection. These modules are actively maintained by the community and provide flexibility and extensibility beyond what a core implementation could offer. Introducing path.mimeType() and path.charset() would duplicate functionality that is already readily available and well-supported, leading to redundancy and possibly fragmenting the ecosystem.

3. Performance Concerns

Implementing and maintaining a reliable MIME type and charset detection mechanism within the Node.js core could introduce performance overhead, particularly in cases where the detection process is complex or requires frequent updates to a large mapping table. Even if the performance impact is minimal, the added complexity in the core could have unintended side effects on the overall performance of the Node.js runtime.