smola / galimatias

galimatias is a URL parsing and normalization library written in Java.
http://galimatias.mola.io
MIT License
160 stars 37 forks source link

ICU4J dependency #57

Open gsmet opened 9 years ago

gsmet commented 9 years ago

Hi Santiago,

We are using galimatias for quite some time now without any issue.

You recently introduced a dependency to ICU4J in this commit https://github.com/smola/galimatias/commit/5ce2cb91f6c7aa9f1f8aabe95aae3cda0b03a939 and I was wondering if there was a way to fix the issue without adding this dependency.

You might have missed it but ICU4J is a 10 MB jar which is quite huge.

Thanks for your work.

dadmin-admin commented 9 years ago

+1

smola commented 5 years ago

As far as I know, ICU4J contains the only full IDNA:2008 implementation in Java, which is required to fully support the URL standard. I will consider any other implementation if it is available. It might also be possible to extract a subset of ICU4J and vendor it here, but I didn't check how much code would be required.

PRs or suggestions are welcome.

sideshowbarker commented 5 years ago

As far as I know, ICU4J contains the only full IDNA:2008 implementation in Java

Right. Specifically, java.net.IDN still doesn’t have IDNA:2008 support. Neither does the GNU Libidn Java port. (The GNU Libidn2 C library implements IDNA:2008, but there’s no Java port of it.)

which is required to fully support the URL standard. I will consider any other implementation if it is available. It might also be possible to extract a subset of ICU4J and vendor it here, but I didn't check how much code would be required.

It seems to me that the level of effort required to extract what’s needed for galimatias’s purpose would be quite high. So while I’m not super happy either with the ICU4J dependency, it really does seem like we’re pretty much stuck with it — at least if the goal continues to be to provide an implementation that conforms to the URL standard (which I strongly agree it should be).