owncloud / files_texteditor

GNU Affero General Public License v3.0
18 stars 25 forks source link

"false" returned as content when opening some files with ANSI encoding #405

Closed fmkaiser closed 1 year ago

fmkaiser commented 1 year ago

One of our users reported that when opening some files saved under windows, "false" is returned as the content in the text editor:

Screenshot_20230915_132127

It appears that if the encoding was not detected correctly before, the iconv() call in controller/filehandlingcontroller.php can fail, returning false, which is currently passed as the file content to the text editor.

This is especially problematic when users then save the file, overwriting the original content.

With PHP 7.4.33 this can be reproduced using a file with content "Viele Grüße" in WINDOWS-1252 (aka ANSI) encoding generated like this:

echo "Viele Grüße" | iconv -f utf-8 -t WINDOWS-1252 > viele_gruesse_ansi.txt

viele_gruesse_ansi.txt

mb_detect_encoding wrongly classifies the encoding as EUC-CN, for which the iconv() call then fails with

[GET][/apps/files_texteditor/ajax/loadfile?filename=xxx] iconv(): Detected an illegal character in input string at /var/www/html/owncloud/apps/files_texteditor/controller/filehandlingcontroller.php#157

The wrongly detected encoding may be a PHP issue, for comparison under PHP 8.1.2 (Ubuntu 22.04) the encoding is detected correctly as Windows-1252.

But given that ownCloud still requires PHP 7 it might be prudent to introduce better error handling e.g.

$fileContents = \iconv(...);

if ($fileContents === false) {
        return new DataResponse(['message' => (string)$this->l->t('Cannot convert file encoding.')], Http::STATUS_BAD_REQUEST);
}

It might also be worth trying to decode again with the above defined ISO-8859-15 default encoding first.

If you think that makes sense, I am happy to submit a pull request.

Edit: Forgot to mention, this affects at least versions 2.5.2 and 2.6.0 of files_texteditor.
The problem was likely introduced or excaberated when the additional encodings were added in https://github.com/owncloud/files_texteditor/commit/d70f6c5d9d559b4f2cda8dbc95c74ecacaee8184

pako81 commented 1 year ago

@fmkaiser thanks for the report. Not sure if error handling is really needed, we can eventually just set the encoding back to default (after retrieving again the node content) in case \iconv returns false because of wrongly detected encoding.

So something like this after https://github.com/owncloud/files_texteditor/blob/master/controller/filehandlingcontroller.php#L158:

// safety net in case \iconv returns false because of wrongly detected encoding
if ($fileContents === false) {
          $fileContents = $node->getContent();
          $fileContents = \iconv("ISO-8859-15", "UTF-8", $fileContents);
}

This seems to make files_texteditor to correctly deal with viele_gruesse_ansi.txt

@jvillafanez @jnweiger what do you think?

jnweiger commented 1 year ago

As far as I can see, that is a bug in php7 mp_detect_encoding(). I've made a suggestion in https://github.com/owncloud/files_texteditor/pull/406#issuecomment-1803839122

pako81 commented 1 year ago

@fmkaiser https://github.com/owncloud/files_texteditor/pull/406 has been just merged. It will be included in next files_texteditor app version.