validateLocalPart allows "@"

JaSpa commented 3 years ago

I came across another thing regarding local parts I don't think is correct: only quoted local parts should be allowed to contain @.

validateLocalPart "@"
-- Success (LocalPart "@")

The problem seems to be in validateLocalPart where allowedChar allows characters which may only occur in quoted local parts to occur unquoted.

pjones commented 3 years ago

Yep, that seems wrong ;)

If you have the time I'd happily accept a patch to fix this.

JaSpa commented 3 years ago

I had a look at it and realized that after decoding a mail address the resulting local part never has surrounding quotes:

decode "\"@\".example.org" ^?! _Right . localPart
-- LocalPart "@"

They are only added back—if necessary—by the functions in the module Addy.Internal.Render which comes in when encodeing and showing a mail address.

Why is this relevant? Because when the parser-phase during decode succeeded the result will pass through validateEmailAddr which in turn calls validateLocalPart. Now there is the reason for its current implementation: validateLocalPart doesn't know if the given local part is quoted or not and it only cares if the local part could be valid when quoting it.

I suggest changing this in the following way:

The parser will keep the surrounding quotes if they are presen't, even if they might not be necessary.
validateLocalPart—which already does unicode normalization—will now also do "quote normalization", meaning it will remove unnecessary quotes, e.g. validateLocalPart "\"test\"" == Success (LocalPart "test") but validateLocalPart "\"@\"" == Success (LocalPart "\"@\"")

I think this change would also lead to a better API: I as a user thought I could use the _LocalPart prism to implement something like database (de)serialization. I now realized this might lead to a problem in the following use case:

Decode address "@"@example.org
Something, something business logic, store only the local part in the database. It will now contain a single @ character.
A different service reads the database and constructs mail addresses from the stored localparts leading to @@example.org.

What do you think of my suggestion? I would be willing to do the implementation work.

JaSpa commented 3 years ago

I looked into this a bit further and realized that the parser removes not only surrounding quotes but also escaping backslashes ~(see also #6)~. Then I wondered if the validateLength function should count only the "meaningful" characters (meaning remove quotes and escaping backslashes, count then)—this is the current behaviour—or all characters needed to encode a syntactically correct mail address.

In other words does the local part a@b which has to be written as "a@b" have three or five bytes? I tried to look at RFC 3696 §3 but didn't find anything specific. Personally I think five bytes is more reasonable since this is the size of the local part you would need to encode it at when sending it in an email header.

^{I really do hope I am not a nuisance with all my comments and raised issues.}

pjones commented 3 years ago

@JaSpa I really appreciate you looking into this and coming up with some very good suggestions.

My intent was to make it impossible to construct illegal email addresses. It's definitely a problem that the parser strips off the quotes if they are necessary. In the case of having @ as a local part that should totally fail validation and trying to use it with the prism should also fail.

I agree with your suggestions about quote normalization and the changes that need to be made. Please let me know if you need any help while writing up a patch. I look forward to seeing your changes!

Oh, and as far as validateLength goes, I think you are correct, the quotes should be included in the byte count. If I were forced to use a fixed-size buffer I would need to know the maximum number of bytes on-the-wire (so to speak) of an email address. Therefore every byte must count.

pjones / addy

validateLocalPart allows "@" #5