pjones / addy

A full-featured library for parsing, validating, and rendering email addresses
BSD 2-Clause "Simplified" License
4 stars 1 forks source link

validateLocalPart allows "@" #5

Open JaSpa opened 3 years ago

JaSpa commented 3 years ago

I came across another thing regarding local parts I don't think is correct: only quoted local parts should be allowed to contain @.

validateLocalPart "@"
-- Success (LocalPart "@")

The problem seems to be in validateLocalPart where allowedChar allows characters which may only occur in quoted local parts to occur unquoted.

pjones commented 3 years ago

Yep, that seems wrong ;)

If you have the time I'd happily accept a patch to fix this.

JaSpa commented 3 years ago

I had a look at it and realized that after decoding a mail address the resulting local part never has surrounding quotes:

decode "\"@\".example.org" ^?! _Right . localPart
-- LocalPart "@"

They are only added back—if necessary—by the functions in the module Addy.Internal.Render which comes in when encodeing and showing a mail address.

Why is this relevant? Because when the parser-phase during decode succeeded the result will pass through validateEmailAddr which in turn calls validateLocalPart. Now there is the reason for its current implementation: validateLocalPart doesn't know if the given local part is quoted or not and it only cares if the local part could be valid when quoting it.

I suggest changing this in the following way:

I think this change would also lead to a better API: I as a user thought I could use the _LocalPart prism to implement something like database (de)serialization. I now realized this might lead to a problem in the following use case:

  1. Decode address "@"@example.org
  2. Something, something business logic, store only the local part in the database. It will now contain a single @ character.
  3. A different service reads the database and constructs mail addresses from the stored localparts leading to @@example.org.

What do you think of my suggestion? I would be willing to do the implementation work.

JaSpa commented 3 years ago

I looked into this a bit further and realized that the parser removes not only surrounding quotes but also escaping backslashes ~(see also #6)~. Then I wondered if the validateLength function should count only the "meaningful" characters (meaning remove quotes and escaping backslashes, count then)—this is the current behaviour—or all characters needed to encode a syntactically correct mail address.

In other words does the local part a@b which has to be written as "a@b" have three or five bytes? I tried to look at RFC 3696 §3 but didn't find anything specific. Personally I think five bytes is more reasonable since this is the size of the local part you would need to encode it at when sending it in an email header.


I really do hope I am not a nuisance with all my comments and raised issues.

pjones commented 3 years ago

@JaSpa I really appreciate you looking into this and coming up with some very good suggestions.

My intent was to make it impossible to construct illegal email addresses. It's definitely a problem that the parser strips off the quotes if they are necessary. In the case of having @ as a local part that should totally fail validation and trying to use it with the prism should also fail.

I agree with your suggestions about quote normalization and the changes that need to be made. Please let me know if you need any help while writing up a patch. I look forward to seeing your changes!

Oh, and as far as validateLength goes, I think you are correct, the quotes should be included in the byte count. If I were forced to use a fixed-size buffer I would need to know the maximum number of bytes on-the-wire (so to speak) of an email address. Therefore every byte must count.