Open GoogleCodeExporter opened 8 years ago
Btw, is there a RFC document which describes which domainnames are valid and
which domainnames are not valid? I think, if there are rules, we should include
them for the case, that email address is domain name.
As written above, "sub.domain.invalid" or "domain.invalid" are invalid hosts as
well as just "invalid".
Also we might think about IDN (= Internationalized Domain Names). You know that
there are already TLDs and domainnames with Unicode characters? Actually, they
are converted from Unicode into ASCII-compatible Punycode. (I did this
convertion in my Java branch already to avoid that the DNS-check crashes the
whole application).
I do not know if a IDN mailadress should be invalid or not. Actually, it seems
that an IDN email address is NOT valid, BUT the client software (should)
convert the Unicode domain-part into a SMTP-compatible Punycode.
At the moment, is_email does give OK for IDNs. (You can see additional
testcases with Japanese, Hebrew etc hostnames in Unicode in my tests_alt.xml in
my Java branch). It is up to you if you want to allow IDNs or not. I think
technically (RFC?) only ASCII domainnames are allowed - therefore CLIENTS have
to convert Unicode domains into Punycode, before the mail can be sent.
(PS: For your PHP branch you can use some free available source codes for
Unicode <-> Punycode convertion)
Original comment by danielma...@googlemail.com
on 12 Oct 2010 at 8:38
I think rejecting "invalid" as a TLD might prevent other developers from using
this as test data in their own projects. In other words, a developer might want
to register test@invalid as an address to test another part of his code.
I would like to support IDNs properly in is_email(). How much work do you think
it would take?
I am happy for you to change the PHP code to do this if you have time (but
please don't check it in to Google Code before I've reviewed it).
Original comment by dominic....@gmail.com
on 12 Oct 2010 at 11:15
On the question of valid domain names - the syntax is defined adequately in RFC
5321. There is a canonical list of TLDs on the IANA site but it changes too
frequently to hard-code it into the function. And it changes too infrequently
for it to be worth doing a run-time lookup. The DNS check is better.
Original comment by dominic....@gmail.com
on 12 Oct 2010 at 11:20
Hello. Well, I am happy to work on that project, but I still need your work in
all these formal RFC thinys. My English skills are not good enough to read a
complete RFC and make a clear statement out of it.
1) So, for IDNs my question would be: What do the RFC say about domainnames?
Are there rules about domains? May they have special characters like Unicode
characters? Or do they have to be pure ASCII? If it is so, we just need to
reject every domain which is Unicode and not ASCII (the correct address would
then be the Punycoded one)
2) For the TLD "invalid" which is defined by the RFC (I am not 100% sure, but I
think RFC and NOT only IANA has defined this TLD!) I don't understand your
statement. Why should a developer use the testdata "test@invalid" and expect
that it is a VALID address??? Just because we are checking RFC-defined TLDs
like "invalid", it does not mean that we are checking against all kind of TLDs
(This was discussed and declined in Issue #2 since the DNS check does all the
work). I am fine with rejecting "hello@world.xxx" with an "Domain not found
warning", but "hello@world.invalid" should raise an failure "RFC-incompatible
domain name" since RFC says "this domain name is against our rules".
Sorry that I cannot give you exact information about the position where I read
these few things long time ago in various RFC documents.
Regards, Daniel
Original comment by danielma...@googlemail.com
on 12 Oct 2010 at 2:38
About IDN emails:
I read in German Wikipedia (yes, I know it's not reliable) that nothing changes
at the main mail communication because of IDN. All characters above 127 are
strictly forbidden. Only ASCII is allowed (can you please verify that
statements in the RFCs?)
For IDN communication, the CLIENT has to encode Unicode domains into ASCII
Punycode.
I am not sure, but I think I have also read in some RFC that the Client
"SHOULD" do the Punycoding. (Which means, the server COULD technically also
accept Unicode addresses? I am not sure about this.
Also, I am not sure what happens if the localPart becomes Unicode. I think the
destination will then find out the correct mailbox. (Somewhere described in
RFC?) But I am not sure if the mail-connection (SMTP, POP3) is RFC-technically
Unicode capable at all.
Currently, your validator accepts all kinds of addresses, Unicode or not. If
you can verify the statement, then you should disable Non-ASCII signs for at
least the domain part.
(Maybe in future writing a emailtrim() function which Punycodes the
emailaddress to an USEABLE ASCII email address, which has also no whitespaces
and CWFS etc) - I wrote something similar in Java already.
Original comment by danielma...@googlemail.com
on 18 Oct 2010 at 12:37
Original issue reported on code.google.com by
danielma...@googlemail.com
on 11 Oct 2010 at 12:05