Closed atoomic closed 3 years ago
This will result in invalid UTF-8 output if presented with noncharacters (U+FDD0), surrogates (U+D800), or codepoints outside the Unicode set (U+110000), but that was already true since Encode was being used with 'utf8' and not 'UTF-8', so it is technically true that this will not be a behavior change.
Personally, I don't see this as a win - the Encode usage can be improved in the future but utf8::encode cannot, and this is not a bottleneck.
The main win is on the memory used by the process, here is a dumb/simple check on linux
╰─> perl -e 'use utf8; print qx{grep RSS /proc/$$/status}'
VmRSS: 2300 kB
╰─> perl -e 'use Encode; print qx{grep RSS /proc/$$/status}'
VmRSS: 4256 kB
@Grinnz are you suggesting this is the change we should be making here?
- my $context = Encode::encode('utf8', $self->{context}, Encode::FB_DEFAULT);
+ my $context = Encode::encode('UTF-8', $self->{context}, Encode::FB_DEFAULT);
If anything, I would suggest removal of the Encode::FB_DEFAULT argument, since that is already default (as the name suggests) and specifying it explicitly in this way causes $self->{context}
to get modified in place due to Encode's weird API.
I also think using the 'UTF-8' encoding in this case is a good idea though, as in most cases.
@Grinnz: If I read all of your comments correctly, you seem to be saying that this change should have no regressions. You also seem to be saying that the code in question is wrong and we should fix it. Are you up for providing a pull request for it?
Sure, opened as #60
discussion moved to #60
utf8::encode can be used instead of the Encode::encode function in this case.
This is avoiding loading Encode where unneeded and reduces memory footprint.
Check to confirm using
utf8::encode
provides the same behavior: