rgl / MailBounceDetector

Detects whether a MailKit email Message is a bounce message
MIT License
28 stars 13 forks source link

Diagnostic code matching does not seem to be right #22

Open rklec opened 5 days ago

rklec commented 5 days ago

Take this diagnostic code in a mail:

Diagnostic-Code: smtp; 550 5.1.1 <MAILER-DAEMON@mail.localhost> User doesn't
    exist: MAILER-DAEMON@mail.localhost 

As per it's RegEx defined in the source. https://github.com/rgl/MailBounceDetector/blob/413ea4bb64f68b3d4069bdd4470d2cae4b8ba0c4/MailBounceDetector/BounceDetectResult.cs#L22

https://regex101.com/r/pKmRdj/1

This results in two groups being matched and then also listed:

Here the relevant sepc part: https://www.rfc-editor.org/rfc/rfc3464#section-2.3.6

The diagnostic-type is `smtp´ in this case, but could be something different - it is just skipped for some reason? (Note I also don't get what valöues it should contain it talks about some IANA specification in https://www.rfc-editor.org/rfc/rfc3464#page-32 and https://www.rfc-editor.org/rfc/rfc3464#section-2.1.2, but well maybe it also doe snot matter.)

As per spec, then a semicolon follows and some arbitrary text:

diagnostic-code-field = "Diagnostic-Code" ":" diagnostic-type ";" *text

So I don't quite see the sense, why the regex splits it up again? In my case, the status is again reported and I am not sure what 550 is supposed to mean. In any case, it's no new type?

So maybe just output the whole thing as a string instead? The parsing seems to make no sense? Or do I miss some spec here that clearly defines that there are numbers in there?

rklec commented 5 days ago

Okay the -typethings are usual stuff in the spec, maybe it is not wrong. But is the value dns. Apparently it is always dns: https://www.rfc-editor.org/rfc/rfc3461#section-9.3 Likewise the smtp for Diagnostic code: https://www.rfc-editor.org/rfc/rfc3461#section-9.2

So maybe splitting that away is okay, but anyway the part afterwards is strange to me...