sindresorhus / file-type

Detect the file type of a file, stream, or data
MIT License
3.64k stars 345 forks source link
buffer detect file file-types javascript magic magic-numbers nodejs uint8array

file-type logo

Detect the file type of a file, stream, or data

The file type is detected by checking the magic number of the buffer.

This package is for detecting binary-based file formats, not text-based formats like .txt, .csv, .svg, etc.

We accept contributions for commonly used modern file formats, not historical or obscure ones. Open an issue first for discussion.

Install

npm install file-type

This package is an ESM package. Your project needs to be ESM too. Read more.

If you use it with Webpack, you need the latest Webpack version and ensure you configure it correctly for ESM.

Usage

Node.js

Determine file type from a file:

import {fileTypeFromFile} from 'file-type';

console.log(await fileTypeFromFile('Unicorn.png'));
//=> {ext: 'png', mime: 'image/png'}

Determine file type from a Uint8Array/ArrayBuffer, which may be a portion of the beginning of a file:

import {fileTypeFromBuffer} from 'file-type';
import {readChunk} from 'read-chunk';

const buffer = await readChunk('Unicorn.png', {length: 4100});

console.log(await fileTypeFromBuffer(buffer));
//=> {ext: 'png', mime: 'image/png'}

Determine file type from a stream:

import fs from 'node:fs';
import {fileTypeFromStream} from 'file-type';

const stream = fs.createReadStream('Unicorn.mp4');

console.log(await fileTypeFromStream(stream));
//=> {ext: 'mp4', mime: 'video/mp4'}

The stream method can also be used to read from a remote location:

import got from 'got';
import {fileTypeFromStream} from 'file-type';

const url = 'https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg';

const stream = got.stream(url);

console.log(await fileTypeFromStream(stream));
//=> {ext: 'jpg', mime: 'image/jpeg'}

Another stream example:

import stream from 'node:stream';
import fs from 'node:fs';
import crypto from 'node:crypto';
import {fileTypeStream} from 'file-type';

const read = fs.createReadStream('encrypted.enc');
const decipher = crypto.createDecipheriv(alg, key, iv);

const streamWithFileType = await fileTypeStream(stream.pipeline(read, decipher));

console.log(streamWithFileType.fileType);
//=> {ext: 'mov', mime: 'video/quicktime'}

const write = fs.createWriteStream(`decrypted.${streamWithFileType.fileType.ext}`);
streamWithFileType.pipe(write);

Browser

import {fileTypeFromStream} from 'file-type';

const url = 'https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg';

const response = await fetch(url);
const fileType = await fileTypeFromStream(response.body);

console.log(fileType);
//=> {ext: 'jpg', mime: 'image/jpeg'}

API

fileTypeFromBuffer(buffer)

Detect the file type of a Uint8Array, or ArrayBuffer.

The file type is detected by checking the magic number of the buffer.

If file access is available, it is recommended to use fileTypeFromFile() instead.

Returns a Promise for an object with the detected file type:

Or undefined when there is no match.

buffer

Type: Uint8Array | ArrayBuffer

A buffer representing file data. It works best if the buffer contains the entire file. It may work with a smaller portion as well.

fileTypeFromFile(filePath)

Detect the file type of a file path.

This is for Node.js only.

To read from a File, see fileTypeFromBlob().

The file type is detected by checking the magic number of the buffer.

Returns a Promise for an object with the detected file type:

Or undefined when there is no match.

filePath

Type: string

The file path to parse.

fileTypeFromStream(stream)

Detect the file type of a web ReadableStream.

If the engine is Node.js, this may also be a Node.js stream.Readable.

Direct support for Node.js streams will be dropped in the future, when Node.js streams can be converted to Web streams (see toWeb()).

The file type is detected by checking the magic number of the buffer.

Returns a Promise for an object with the detected file type:

Or undefined when there is no match.

stream

Type: Web ReadableStream or Node.js stream.Readable

A readable stream representing file data.

fileTypeFromBlob(blob)

Detect the file type of a Blob,

[!TIP]

A File object is a Blob and can be passed in here.

It will stream the underlying Blob, and required a ReadableStreamBYOBReader which require Node.js ≥ 20.

The file type is detected by checking the magic number of the blob.

Returns a Promise for an object with the detected file type:

Or undefined when there is no match.

import {fileTypeFromBlob} from 'file-type';

const blob = new Blob(['<?xml version="1.0" encoding="ISO-8859-1" ?>'], {
    type: 'text/plain',
    endings: 'native'
});

console.log(await fileTypeFromBlob(blob));
//=> {ext: 'txt', mime: 'text/plain'}

blob

Type: Blob

fileTypeFromTokenizer(tokenizer)

Detect the file type from an ITokenizer source.

This method is used internally, but can also be used for a special "tokenizer" reader.

A tokenizer propagates the internal read functions, allowing alternative transport mechanisms, to access files, to be implemented and used.

Returns a Promise for an object with the detected file type:

Or undefined when there is no match.

An example is @tokenizer/http, which requests data using HTTP-range-requests. A difference with a conventional stream and the tokenizer, is that it can ignore (seek, fast-forward) in the stream. For example, you may only need and read the first 6 bytes, and the last 128 bytes, which may be an advantage in case reading the entire file would take longer.

import {makeTokenizer} from '@tokenizer/http';
import {fileTypeFromTokenizer} from 'file-type';

const audioTrackUrl = 'https://test-audio.netlify.com/Various%20Artists%20-%202009%20-%20netBloc%20Vol%2024_%20tiuqottigeloot%20%5BMP3-V2%5D/01%20-%20Diablo%20Swing%20Orchestra%20-%20Heroines.mp3';

const httpTokenizer = await makeTokenizer(audioTrackUrl);
const fileType = await fileTypeFromTokenizer(httpTokenizer);

console.log(fileType);
//=> {ext: 'mp3', mime: 'audio/mpeg'}

Or use @tokenizer/s3 to determine the file type of a file stored on Amazon S3:

import S3 from 'aws-sdk/clients/s3';
import {makeTokenizer} from '@tokenizer/s3';
import {fileTypeFromTokenizer} from 'file-type';

// Initialize the S3 client
const s3 = new S3();

// Initialize the S3 tokenizer.
const s3Tokenizer = await makeTokenizer(s3, {
    Bucket: 'affectlab',
    Key: '1min_35sec.mp4'
});

// Figure out what kind of file it is.
const fileType = await fileTypeFromTokenizer(s3Tokenizer);
console.log(fileType);

Note that only the minimum amount of data required to determine the file type is read (okay, just a bit extra to prevent too many fragmented reads).

tokenizer

Type: ITokenizer

A file source implementing the tokenizer interface.

fileTypeStream(webStream, options?)

Returns a Promise which resolves to the original readable stream argument, but with an added fileType property, which is an object like the one returned from fileTypeFromFile().

This method can be handy to put in between a stream, but it comes with a price. Internally stream() builds up a buffer of sampleSize bytes, used as a sample, to determine the file type. The sample size impacts the file detection resolution. A smaller sample size will result in lower probability of the best file type detection.

Note: When using Node.js, a stream.Readable may be provided as well.

readableStream

Type: stream.Readable

options

Type: object

sampleSize

Type: number\ Default: 4100

The sample size in bytes.

Example

import got from 'got';
import {fileTypeStream} from 'file-type';

const url = 'https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg';

const stream1 = got.stream(url);
const stream2 = await fileTypeStream(stream1, {sampleSize: 1024});

if (stream2.fileType?.mime === 'image/jpeg') {
    // stream2 can be used to stream the JPEG image (from the very beginning of the stream)
}

readableStream

Type: stream.Readable

The input stream.

supportedExtensions

Returns a Set<string> of supported file extensions.

supportedMimeTypes

Returns a Set<string> of supported MIME types.

Custom detectors

A custom detector is a function that allows specifying custom detection mechanisms.

An iterable of detectors can be provided via the fileTypeOptions argument for the FileTypeParser constructor. In Node.js, you should use NodeFileTypeParser, which extends FileTypeParser and provides access to Node.js specific functions.

The detectors are called before the default detections in the provided order.

Custom detectors can be used to add new FileTypeResults or to modify return behaviour of existing FileTypeResult detections.

If the detector returns undefined, there are 2 possible scenarios:

  1. The detector has not read from the tokenizer, it will be proceeded with the next available detector.
  2. The detector has read from the tokenizer (tokenizer.position has been increased). In that case no further detectors will be executed and the final conclusion is that file-type returns undefined. Note that this an exceptional scenario, as the detector takes the opportunity from any other detector to determine the file type.

Example detector array which can be extended and provided to each public method via the fileTypeOptions argument:

import {FileTypeParser} from 'file-type'; // or `NodeFileTypeParser` in Node.js

const customDetectors = [
    async tokenizer => {
        const unicornHeader = [85, 78, 73, 67, 79, 82, 78]; // 'UNICORN' as decimal string

        const buffer = new Uint8Array(7);
        await tokenizer.peekBuffer(buffer, {length: unicornHeader.length, mayBeLess: true});

        if (unicornHeader.every((value, index) => value === buffer[index])) {
            return {ext: 'unicorn', mime: 'application/unicorn'};
        }

        return undefined;
    },
];

const buffer = new Uint8Array(new TextEncoder().encode('UNICORN'));
const parser = new FileTypeParser({customDetectors}); // `NodeFileTypeParser({customDetectors})` in Node.js
const fileType = await parser.fromBuffer(buffer);
console.log(fileType);

Supported file types

Pull requests are welcome for additional commonly used file types.

The following file types will not be accepted:

tokenizer

Type: ITokenizer

Usable as source of the examined file.

fileType

Type: FileTypeResult

An object having an ext (extension) and mime (mime type) property.

Detected by the standard detections or a previous custom detection. Undefined if no matching fileTypeResult could be found.

Related

Maintainers