publicsuffix / list

The Public Suffix List
https://publicsuffix.org/
Mozilla Public License 2.0
2.08k stars 1.23k forks source link

Minified version of the PSL #2256

Open wdhdev opened 2 days ago

wdhdev commented 2 days ago

Would it be worth offering a minified version of the PSL? It would be good for companies that use the PSL but do not need any comments or anything and just want the rules.

So basically just offering a version of the PSL without any comments.

wdhdev commented 1 day ago

I've created a quick script which produces a minified version of the PSL here: https://github.com/wdhdev/psl-min

You can download the minified file from https://psl.hrsn.dev/public_suffix_list.min.dat.

If the maintainers want, they can make this an official version and we can move it to a URL like https://publicsuffix.org/list/public_suffix_list.min.dat.

weppos commented 1 day ago

Would it be worth offering a minified version of the PSL? It would be good for companies that use the PSL but do not need any comments or anything and just want the rules.

So basically just offering a version of the PSL without any comments.

It's one extra source to maintain, and I can't see a lot of value to produce it on our side. It's very easy to transform the list in whatever format you want.

Moreover, different languages/applications may use a different approach for minification or optimization. In many cases, an extra transformation is still needed to compile the list into something more optimized (many consumers concerned about performance have that intermediate step).

Adding one extra minified version will add extra maintenance overhead for very little gain.

My 2 cents.

wdhdev commented 1 day ago

It's one extra source to maintain, and I can't see a lot of value to produce it on our side. It's very easy to transform the list in whatever format you want.

That is true, however I believe if it was implemented it would receive quite a bit of use as most people are not using, nor relying on the comments in the file. The comments in the file increase the file size by over 55%.

Moreover, different languages/applications may use a different approach for minification or optimization. In many cases, an extra transformation is still needed to compile the list into something more optimized (many consumers concerned about performance have that intermediate step).

This is true, however for some basic level consumers who are using a mostly unmodified version of the PSL, it could be helpful.

A bit unrelated, it could be worth deploying separate ICANN and Private section only files, which I know would benefit some consumers, as I know some only want ICANN section TLDs, while others just want Private section domains.

mozfreddyb commented 1 day ago

To be honest, I would actually prefer we go into another direction and move to a better, more expressive format that goes beyond comments where we have better metadata in the entries as dictionaries (json maybe?) so that we can extend the flags in the future. We could still provide the dat as a compile-artifact, but eventually I can see us opening this up the other way around.

wdhdev commented 1 day ago

If we were going to use JSON, it might be a good idea to have individual JSON files for each individual company listed on the PSL (at least in the repo for easy editing) but then combining them all into one huge file on the web server.

I've deployed a JSON version of the PSL here (https://github.com/wdhdev/psl-json): https://psl.hrsn.dev/public_suffix_list.json

weppos commented 23 hours ago

To be honest, I would actually prefer we go into another direction and move to a better, more expressive format that goes beyond comments where we have better metadata in the entries as dictionaries (json maybe?) so that we can extend the flags in the future. We could still provide the dat as a compile-artifact, but eventually I can see us opening this up the other way around.

100% agreed, we have discussed about this in the past. If we have to add an additional format/version, I'd rather use something more expressive and more conveniently parsable.

wdhdev commented 21 hours ago

JSON would definitely be a better option, it's much easier to parse as you say.