thisismynewname / isemail

Automatically exported from code.google.com/p/isemail
0 stars 0 forks source link

TLD "invalid" + domainname check + IDNs? #16

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I remember reading in some RFC document, that the TLD "invalid" is a reserved 
TLD for purpose marking invalid hosts.

The email address "test@invalid" shows only a warning instead of a failure. I 
think, it should be a failure since the main subject of your libary is to 
accomplish with RFC rules.

Original issue reported on code.google.com by danielma...@googlemail.com on 11 Oct 2010 at 12:05

GoogleCodeExporter commented 8 years ago
Btw, is there a RFC document which describes which domainnames are valid and 
which domainnames are not valid? I think, if there are rules, we should include 
them for the case, that email address is domain name.

As written above, "sub.domain.invalid" or "domain.invalid" are invalid hosts as 
well as just "invalid".

Also we might think about IDN (= Internationalized Domain Names). You know that 
there are already TLDs and domainnames with Unicode characters? Actually, they 
are converted from Unicode into ASCII-compatible Punycode. (I did this 
convertion in my Java branch already to avoid that the DNS-check crashes the 
whole application).

I do not know if a IDN mailadress should be invalid or not. Actually, it seems 
that an IDN email address is NOT valid, BUT the client software (should) 
convert the Unicode domain-part into a SMTP-compatible Punycode.

At the moment, is_email does give OK for IDNs. (You can see additional 
testcases with Japanese, Hebrew etc hostnames in Unicode in my tests_alt.xml in 
my Java branch). It is up to you if you want to allow IDNs or not. I think 
technically (RFC?) only ASCII domainnames are allowed - therefore CLIENTS have 
to convert Unicode domains into Punycode, before the mail can be sent.

(PS: For your PHP branch you can use some free available source codes for 
Unicode <-> Punycode convertion)

Original comment by danielma...@googlemail.com on 12 Oct 2010 at 8:38

GoogleCodeExporter commented 8 years ago
I think rejecting "invalid" as a TLD might prevent other developers from using 
this as test data in their own projects. In other words, a developer might want 
to register test@invalid as an address to test another part of his code.

I would like to support IDNs properly in is_email(). How much work do you think 
it would take?

I am happy for you to change the PHP code to do this if you have time (but 
please don't check it in to Google Code before I've reviewed it).

Original comment by dominic....@gmail.com on 12 Oct 2010 at 11:15

GoogleCodeExporter commented 8 years ago
On the question of valid domain names - the syntax is defined adequately in RFC 
5321. There is a canonical list of TLDs on the IANA site but it changes too 
frequently to hard-code it into the function. And it changes too infrequently 
for it to be worth doing a run-time lookup. The DNS check is better.

Original comment by dominic....@gmail.com on 12 Oct 2010 at 11:20

GoogleCodeExporter commented 8 years ago
Hello. Well, I am happy to work on that project, but I still need your work in 
all these formal RFC thinys. My English skills are not good enough to read a 
complete RFC and make a clear statement out of it.

1) So, for IDNs my question would be: What do the RFC say about domainnames? 
Are there rules about domains? May they have special characters like Unicode 
characters? Or do they have to be pure ASCII? If it is so, we just need to 
reject every domain which is Unicode and not ASCII (the correct address would 
then be the Punycoded one)

2) For the TLD "invalid" which is defined by the RFC (I am not 100% sure, but I 
think RFC and NOT only IANA has defined this TLD!) I don't understand your 
statement. Why should a developer use the testdata "test@invalid" and expect 
that it is a VALID address??? Just because we are checking RFC-defined TLDs 
like "invalid", it does not mean that we are checking against all kind of TLDs 
(This was discussed and declined in Issue #2 since the DNS check does all the 
work). I am fine with rejecting "hello@world.xxx" with an "Domain not found 
warning", but "hello@world.invalid" should raise an failure "RFC-incompatible 
domain name" since RFC says "this domain name is against our rules".

Sorry that I cannot give you exact information about the position where I read 
these few things long time ago in various RFC documents.

Regards, Daniel

Original comment by danielma...@googlemail.com on 12 Oct 2010 at 2:38

GoogleCodeExporter commented 8 years ago
About IDN emails:

I read in German Wikipedia (yes, I know it's not reliable) that nothing changes 
at the main mail communication because of IDN. All characters above 127 are 
strictly forbidden. Only ASCII is allowed (can you please verify that 
statements in the RFCs?)

For IDN communication, the CLIENT has to encode Unicode domains into ASCII 
Punycode.

I am not sure, but I think I have also read in some RFC that the Client 
"SHOULD" do the Punycoding. (Which means, the server COULD technically also 
accept Unicode addresses? I am not sure about this.

Also, I am not sure what happens if the localPart becomes Unicode. I think the 
destination will then find out the correct mailbox. (Somewhere described in 
RFC?) But I am not sure if the mail-connection (SMTP, POP3) is RFC-technically 
Unicode capable at all.

Currently, your validator accepts all kinds of addresses, Unicode or not. If 
you can verify the statement, then you should disable Non-ASCII signs for at 
least the domain part.

(Maybe in future writing a emailtrim() function which Punycodes the 
emailaddress to an USEABLE ASCII email address, which has also no whitespaces 
and CWFS etc) - I wrote something similar in Java already.

Original comment by danielma...@googlemail.com on 18 Oct 2010 at 12:37