rjbs / Pod-Weaver

recombine pod documents into awesomer pod documents
26 stars 28 forks source link

UTF8 in # ABSTRACT: #23

Closed dan-zeman closed 9 years ago

dan-zeman commented 10 years ago

I use Pod::Weaver plugin in Dist::Zilla when publishing a package on CPAN. Every .pm file in my distribution contains a comment line starting with

ABSTRACT:

Typically this is the first line of the file. I noticed that when the line contained non-ASCII UTF8 characters, the resulting abstract on CPAN (and also in HTML generated from POD) was badly encoded. The input byte sequence was interpreted as something else than UTF8 when the Perl source was being read. UTF8 that I entered directly in my POD text was OK, so that good and bad encoding appeared in the same document. Example:

Example:

This was my abstract line in Lingua/Interset/Tagset/CS/Cnk.pm:

ABSTRACT: Driver for the tagset of the Czech National Corpus (Český národní korpus).

And this is how the package got listed on CPAN:

Lingua::Interset::Tagset::CS::Cnk - Driver for the tagset of the Czech National Corpus (Český národní korpus).

I tried moving the abstract line after "use utf8;" in the source but it did not help. So I have to avoid non-ASCII characters in the abstract line.

Dan

sschober commented 10 years ago

I'm seeing the same issue. Maybe this has to do with this recent endeavour to fix encoding issues in Dist::Zilla?

Some information on my setup:

$ cpanm Mixin::Linewise Data::Section Dist::Zilla Pod::Weaver Dist::Zilla::Plugin::PodWeaver
Mixin::Linewise is up to date. (0.106)
Data::Section is up to date. (0.200006)
Dist::Zilla is up to date. (5.020)
Pod::Weaver is up to date. (4.006)
Dist::Zilla::Plugin::PodWeaver is up to date. (4.005)

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

$ perl --version

This is perl 5, version 21, subversion 0 (v5.21.0) built for x86_64-linux-thread-multi

Please let me know if you need more information!