Open aralroca opened 2 months ago
For path.mimeType(...)
a couple of questions...
For path.charset('...')
... the only way to accomplish this is to actually try reading the file. I don't believe that belongs in the path
module but possibly in fs
. Even then it can be a bit problematic to reliably determine the encoding of a file. Would it be based entirely on the presence of a BOM? Would it be limited to just the charset encodings we already know of (e.g. utf8, utf16le, etc)? Given cases where the charset is ambiguous (e.g. utf8 without a bom that only contains ascii characters could be utf8, ascii, or latin1), what would be the expected result?
Personally, I'm -1 on adding this to core.
Mimetypes can be easily done in userland using a Map<extension, mime-type>
Charsets require reading the file, which is not something the path module does, and I don't think it is something the path module should.
- Would you expect the actual path to be resolved and for the mime type to be sniffed from the contents or based entirely on the file extension?
- Which master list of mime types to file extensions do you propose we use and what is the source of that? How is it maintained.
For
path.charset('...')
... the only way to accomplish this is to actually try reading the file. I don't believe that belongs in thepath
module but possibly infs
. Even then it can be a bit problematic to reliably determine the encoding of a file. Would it be based entirely on the presence of a BOM? Would it be limited to just the charset encodings we already know of (e.g. utf8, utf16le, etc)? Given cases where the charset is ambiguous (e.g. utf8 without a bom that only contains ascii characters could be utf8, ascii, or latin1), what would be the expected result?
@jasnell It is not clear to me that it is necessary to look at the contents of the file. The chatset can be extracted from the mimeType, and the mimeType from the file extension. Nowadays the libraries that support this make use of this JSON reference: https://github.com/jshttp/mime-db/blob/master/db.json
Personally, I'm -1 on adding this to core.
Mimetypes can be easily done in userland using a
Map<extension, mime-type>
Charsets require reading the file, which is not something the path module does, and I don't think it is something the path module should.
@RedYetiDev as I commented the current libraries to do this extract the charset
depending on the mimeType
without looking the internal content. This is very useful for frameworks to serve dynamic assets and put the Content-Type
in a fast way. If the file has to be read it would lose efficiency if you want to serve the file in streaming.
And about the mapping, yes, can be done via a Map
, but this Map is huge, and it doesn't make sense to depend on a library to maintain this JSON, it is better that it is part of the runtime itself IMO, besides that in the end it is adding dependencies for basic things.
(CC @nodejs/path)
(While I am not a core collaborator) I still strongly disagree with the addition of this API into the core of Node.js. While I do greatly appreciate this issue being opened, I have some major concerns with this API. Here are a few of them:
If Node.js were to implement this feature, the project would need to take on the significant responsibility of maintaining a constantly evolving and extensive map from file extensions to character sets and MIME types. Additionally, the project would have to handle edge cases, conflicts, and potential ambiguities that may arise due to overlapping file extensions or varying interpretations of MIME type specifications across different platforms and software.
The Node.js ecosystem already includes well-established modules like mime
and mime-types
that effectively manage MIME type and charset detection. These modules are actively maintained by the community and provide flexibility and extensibility beyond what a core implementation could offer. Introducing path.mimeType()
and path.charset()
would duplicate functionality that is already readily available and well-supported, leading to redundancy and possibly fragmenting the ecosystem.
Implementing and maintaining a reliable MIME type and charset detection mechanism within the Node.js core could introduce performance overhead, particularly in cases where the detection process is complex or requires frequent updates to a large mapping table. Even if the performance impact is minimal, the added complexity in the core could have unintended side effects on the overall performance of the Node.js runtime.
What is the problem this feature will solve?
Adding
path.mimeType()
andpath.charset
will solve the dependency on an external library, mime-types, to get the content type of a file based on its extension. This can be done with Node.js itself, just as Bun.js does withBun.file(file).type
.What is the feature you are proposing to solve the problem?
I am proposing to add two new methods to the path module:
mimeType
- returns the MIME type of a file based on its extension.charset
- returns the charset of a MIME type.And:
What alternatives have you considered?
Alternatively, instead of being in
node:path
, it could be infs.stat
, but I think that in the end it can be extracted from the path without having to parse the file, besides it is better to use functions for this and the work is done only if they are executed individually.CC: @vdeturckheim