Unicode Path/Filename for imread and imwrite

opencv-pushbot commented 8 years ago

Transferred from http://code.opencv.org/issues/1268

|| Richard Steffen on 2011-07-29 13:28
|| Priority: Low
|| Affected: None
|| Category: highgui-images
|| Tracker: Feature
|| Difficulty: 
|| PR: 
|| Platform: None / None

Unicode Path/Filename for imread and imwrite

Currently, the imread and imwrite method only supports std::string as an input. This isn't working with non-ascii directories/paths. Therefore, the a software depends on OpenCV can not guaranty working on all maschines.

History

Alexander Shishkov on 2012-02-12 20:47

-   Description changed from Currently, the imread and imwrite method
    only supports std::string as an inpu... to Currently, the imread and
    imwrite method only supports std::string as an inpu... More

Vadim Pisarevsky on 2012-04-04 11:58

std::string is still capable of storing any unicode name via UTF-8 encoding, and so it's fopen responsibility to handle this UTF-8 name properly. On Mac and Linux I was able to store image into a file with non-ASCII letters using normal cv::imwrite(). I guess, on Windows it will work too, if you save a source file to UTF-8.
-   Priority changed from High to Low
-   Assignee set to Vadim Pisarevsky
-   Status changed from Open to Cancelled

Andrey Kamaev on 2012-05-18 14:20

-   Target version set to 2.4.0

n n on 2014-03-18 13:11

AFAIK fopen does not support Unicode on Windows and can't be used to open a path with Unicode characters. The UTF-8 string must be converted to UTF-16 and given to _wfopen instead. See ImageMagick's fopen_utf8 wrapper for example code: http://www.imagemagick.org/api/MagickCore/utility-private_8h_source.html#l00103
-   Target version changed from 2.4.0 to 2.4.9
-   Status changed from Cancelled to Open

n n on 2014-03-19 10:48

One possible workaround for now using Boost and a memory mapped file:

    mapped_file map(path(L"filename"), ios::in);
    Mat file(1, numeric_cast<int>(map.size()), CV_8S, const_cast<char*>(map.const_data()), CV_AUTOSTEP);
    Mat image(imdecode(file, 1));

The downside is that I/O errors cause access violations instead of C++ exceptions. Also don't write to the "file" Mat. :)

n n on 2014-03-19 11:20

Unfortunately the trick of avoiding to store the image file in memory doesn't work with imwrite, as imencode stores the output in a vector with standard allocator specified. If memory is no issue the contents can of course be written to file using Boost afterwards.

Alexander Smorkalov on 2014-04-02 01:18

-   Target version changed from 2.4.9 to 3.0

gadcam commented 6 years ago

IMHO this should be specified in the documentation : here http://docs.opencv.org/master/d4/da8/group__imgcodecs.html#ga288b8b3da0892bd651fce07b3bbd3a56 and here http://docs.opencv.org/master/d4/da8/group__imgcodecs.html#gabbc7ef1aa2edfaa87772f1202d67e0ce.

vquilon commented 4 years ago

Encoding issues.... difficult but no impossible

if you have this string = 'テスト/abc.jpg' You can encode as Windows encoding the characters like this-> print('テスト/abc.jpg'.encode('utf-8').decode('unicode-escape')) And you get something like this = 'ãã¹ã/abc.jpg'

Then if you want to read the file and get the filenames readable and usable, you can use some library to read the filenames of your path and then change the encoding-> #fname is like 'ãã¹ã/abc.jpg' fname.encode('iso-8859-1').decode('utf-8')) # This result of your initial string ='テスト/abc.jpg'

asmorkalov commented 4 years ago

OpenCV core team discussed the problem on weekly meeting and decided to stay conservative and do not introduce new API calls with wchar_t, wstring and other string types By the following reasons:

Most of image decoding and encoding libraries use standard fopen call to open files and extra wchar_t support requires domain libraries modification
Modern Linux, Mac OS and latest Windows releases support UTF-8 encoding that allows to use std::string as container to pass it to OpenCV functions.
Popular FSes on Linux do not use wchar_t natively and the overloads are not cross platform solution.

There are 2 alternatives to use wchar_t strings with OpenCV:

Convert wchar_t strings to UTF-8 and pass UTF-8 string as cv::imread and cv::imwrite parameter. UTF-8 string is handled by system fopen call and it's behavior depends on OS support and locale. See mbstowcs in C++ standard for more details.
OpenCV provides cv::imdecode and cv::imencode functions that allow to decode and encode image using memory buffer as input and output. The solution decouples file IO and image decoding and allows to manage path strings, locales, etc in user code. See code snippet for cv::imencode bellow. fopen can be replaced with _wfopen for wide strings support. See Microsoft reference manual for details: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=vs-2019
```
#include <vector>
#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
```

int main(int argc, char * argv) { FILE f = fopen("lena.jpg", "rb"); fseek(f, 0, SEEK_END); // seek to end of file size_t buffer_size = ftell(f); // get current file pointer fseek(f, 0, SEEK_SET); // seek back to beginning of file

std::vector<char> buffer(buffer_size);
fread(&buffer[0], sizeof(char), buffer_size, f);
fclose(f);

cv::Mat frame = cv::imdecode(buffer, cv::IMREAD_COLOR);

cv::imshow("Camera", frame);
cv::waitKey();

}

petered commented 1 month ago

Could we perhaps at least raise an error instead of silently just failing to write?

marscher commented 2 weeks ago

Raising would be indeed very helpful. I currently have to work with a Windows machine and accidentially a "Umlaut" slipped into my path. Invoking the API by Python I assumed that handling of UTF8 would be handled correctly.

henryruhs commented 6 days ago

I posted a workaround on stackoverflow, that works flawless with any cv2 method: https://stackoverflow.com/a/78462365/924017

opencv / opencv

Unicode Path/Filename for imread and imwrite #4292

Unicode Path/Filename for imread and imwrite

History

Alexander Shishkov on 2012-02-12 20:47

Vadim Pisarevsky on 2012-04-04 11:58

Andrey Kamaev on 2012-05-18 14:20

n n on 2014-03-18 13:11

n n on 2014-03-19 10:48

n n on 2014-03-19 11:20

Alexander Smorkalov on 2014-04-02 01:18