owncloud / files_texteditor

GNU Affero General Public License v3.0
18 stars 25 forks source link

Fix for iconv returning false #406

Closed pako81 closed 1 year ago

pako81 commented 1 year ago

In some cases, \iconv may return false because of wrongly detected encoding. This messes up with the file content. We now set the encoding back to default ISO-8859-15 (after retrieving again the node content) in case of encoding being wrongly detected.

Related issue: https://github.com/owncloud/files_texteditor/issues/405

CLAassistant commented 1 year ago

CLA assistant check
All committers have signed the CLA.

jvillafanez commented 1 year ago

I don't think this is the right approach. What if the file has a different encoding? The second iconv will still fail and we'll have the same problem.

Maybe we should send the raw content (without the iconv), consider the file as read-only (to prevent issues with saving the file with a different encoding) and show an error about an unexpected encoding.

jnweiger commented 1 year ago

I've tried sending the raw file: The texteditor window remains blank, and owncloud hangs. Multiple decode attempts would make more sense to me.

The situation is, that mb_detect_encoding() returns something wrong.

Thus the subsequent iconf is likely to fail then(). Instead of a hard fallback to ISO-8859-15, we could at least try all the encodings one by one, that we have in the list for mb_detect_encoding() the first one that does not return false, is probably right.

This reduces the chance of permanent failure. If all attempts return false, I'd run a final \iconv("ISO-8859-15", "UTF-8//TRANSLIT", $fileContents);

pako81 commented 1 year ago

Yes, passing the raw file also did not work for me.

Instead of a hard fallback to ISO-8859-15, we could at least try all the encodings one by one, that we have in the list for mb_detect_encoding() the first one that does not return false, is probably right.

Is \mb_detect_encoding not already going through the whole list of possible encodings passed at https://github.com/owncloud/files_texteditor/blob/master/controller/filehandlingcontroller.php#L153 ?

Quoting https://www.php.net/manual/en/function.mb-detect-encoding.php: A list of character encodings to try, in order. The list may be specified as an array of strings, or a single string separated by commas.

pako81 commented 1 year ago

Maybe rather setting strict to false so that the closest encoding will be returned? Not sure about any side effects however.

Or we just throw an error message when trying to open a file which could not be encoded, something like: return new DataResponse(['message' => (string)$this->l->t('File encoding could not be detected.')], Http::STATUS_BAD_REQUEST);

jvillafanez commented 1 year ago

From my point of view, the best option is:

pako81 commented 1 year ago

@jvillafanez implemented your suggestion. Not sure this is exactly what you meant.

jvillafanez commented 1 year ago

Yes, that's the idea.

This should do the same, and it's more compact.

if ($encoding !== 'UTF-8') {
  $writeable = false;
  $fileContents = \iconv($encoding, "UTF-8", $fileContents);
}

if ($fileContents !== false) {
  return new DataResponse([
    'filecontents' => $fileContents,
    'writeable' => $writable,
    'locked' => $activePersistentLock ? $activePersistentLock->getOwner() : null,
    'mime' => $mime,
    'mtime' => $mTime
  ], Http::STATUS_OK);
} else {
  return new DataResponse(['message' => (string)$this->l->t('Cannot convert the encoding to UTF-8.')], Http::STATUS_BAD_REQUEST);
}
pako81 commented 1 year ago

@jvillafanez thx, done.

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

50.0% 50.0% Coverage
0.0% 0.0% Duplication

jnweiger commented 12 months ago

Yes, passing the raw file also did not work for me.

Instead of a hard fallback to ISO-8859-15, we could at least try all the encodings one by one, that we have in the list for mb_detect_encoding() the first one that does not return false, is probably right.

Is \mb_detect_encoding not already going through the whole list of possible encodings passed at https://github.com/owncloud/files_texteditor/blob/master/controller/filehandlingcontroller.php#L153 ?

Quoting https://www.php.net/manual/en/function.mb-detect-encoding.php: A list of character encodings to try, in order. The list may be specified as an array of strings, or a single string separated by commas.

Yes, mb-detect-encoding() is documented to do that. But in 7.4 it is broken. That Is why I suggest to brute force all the encodings from the list in a loop with iconv().

The suggestion from @jvillafanez a) does not make any further attempts, when the first iconv fails. b) suffers from #411

DeepDiver1975 commented 12 months ago

you are missing unit tests .... people please .... https://www.reddit.com/r/learnprogramming/comments/tg9yog/why_write_unit_tests/