ykaliuta / fidogate

FidoGate
GNU General Public License v2.0
12 stars 6 forks source link

Default charset for rfc2ftn and "passthru" charset #11

Open evs38 opened 4 years ago

evs38 commented 4 years ago

I suggest to add fourth optional parameter into charset translation schemas that will set up the default fallback charset for rfc2ftn gating. Charset should be assumed as fallback when Content-Type header is in RFC article. There are some groups where default charset is UTF-8, but some old clients are using old 8-bit charsets without Content-Type header. In russian groups is is koi8-r or windows-1251, in english groups it can be us-ascii, iso8859-1 or windows-1252.

Example: DefaultCharset cp866:cp866:utf-8:koi8-r In this case: rfc2ftn:

  1. FTN message without @CHRS should be threated as written in CP866 and recoded to UTF-8
  2. FTN message with @CHRS should be threated as written in charset that in @CHRS kludge and recored to UTF-8

ftn2rfc:

  1. RFC message without charset in Content-Type header should be threated as KOI8-R and recoded to CP866
  2. RFC message with charset in Content-Type header should be threated as written in that charset and recoded to CP866

Also, I suggest to add special "passthru" charset for rfc2ftn and ftn2rfc schemas which would indicate that no transcoding is needed, only correct headers and kludges.

Examples: cp866:utf-8:passthru:utf-8 FTN message with LATIN-1 CHRS kludge should not be recoded to UTF-8 when gating ftn2rfc but iso-8859-1 charset should be added to Content-Type header. Also, ASCII -> us-ascii, LATIN-5 -> iso-8859-9, etc. Unknown charsets should be threated as written in CP866 and recoded as usual.

cp866:passthru:utf-8:utf-8 RFC message with iso-8859-1 in Content-Type header shold not be recoded to CP866 when gating rfc2ftn but CHRS: LATIN-1 kludge should be added. Also, us-ascii -> ASCII, ibm866 or cp866 -> CP866, etc. Unknown charsets should be threated as written in UTF-8 and recoded as usual.

ykaliuta commented 4 years ago

passthru does not fit well since it requires to maintain charset names mapping (which is now after switching completely to iconv not mandatory).

Default rfc IIRC is compile-time constant at the moment, but sounds that it can be implemented this way (having it optional to keep config compatibility). Thanks for the proposal!

ykaliuta commented 3 years ago

https://github.com/ykaliuta/fidogate/commit/26b8fb21debf3af9b491c96b42598ead0b4141b7 should implement the first proposal. For DefaultCharset, NetmailCharset and -C area option.