ipfs / helia

An implementation of IPFS in JavaScript
https://helia.io
Other
814 stars 81 forks source link

feat: customizeable content type parsing in @helia/verified-fetch #422

Closed SgtPooki closed 4 months ago

SgtPooki commented 4 months ago

Discussed with @achingbrain due to https://github.com/ipfs/helia/pull/416

Goals

Initial design idea

Remove dependency on mime-types, don't depend on file-type

some interface such as createVerifiedFetch(helia, { contentTypeParser: (bytes) => myFn}) and we provide a default contentTypeParser that determines content type for the below list only.

We would pass the contentTypeParser function the first block of bytes we receive; and because most of our blocks are 1MB or below, we can safely assume the majority of content types users need to recognize can be determined by looking at those first 1MB of bytes.

If content-type is not a recognized type from the below list, we do not set it (allows browser sniffing).

Supported content types

References

cc @achingbrain @aschmahmann @lidel @2color

SgtPooki commented 4 months ago

FYI that file-type is fairly small compared to the entirety of @helia/verified-fetch (currently totals 560.2kb) at only 26.7kb:

image

lidel commented 4 months ago

Providing a way to pass custom content type sniffer sounds sensible, but will be a very niche feature request if your default is something comprehenbsive like file-type with magic bytes sniffing.

I think the question we could ask is when is content-type relevant:

That is to say, I think it is sensible to either:

achingbrain commented 4 months ago

My feeling is that if we don't need to do content type sniffing then let's not do it.

If we need to do it, we should do the minimum required (e.g. just support detecting a small subset of content types) and provide an extension mechanism for more comprehensive detection.

Given that we're billing this as fetch-like, most people will just do .json(), .blob(), etc and get on with things which suggests that we don't need to detect content types - we just try to process the data as the requested type and fail loudly if we can't.

There are valid use-cases for content detection though (e.g. service worker gateway) so allowing users to configure a mime type sniffer if they need it seems like a good compromise.

2color commented 4 months ago

Mostly agree with @lidel and @achingbrain, though I don't have a strong inclination either way.

If we don't include magic-byte sniffing by default, it should be as easy as possible to configure so it works smoothly in service workers.

SgtPooki commented 4 months ago

Sounds good. Ill get a PR out today that will not do content-type unless passed a function for it