Closed Igorbek closed 10 years ago
Not currently; I didn't have time to look into it when we began the project. Is it as simple as using wchar_t? Will I need to rely on a 3rd party library for checking character classes and such?
It is not require any 3rd party libraries to manipulate wchar_t. It is simple as using char. All functions for char is also available for wchar_t.
I believe we intend it. We are noobs.
On Mon, Jul 9, 2012 at 3:50 PM, Igorbek < reply@reply.github.com
wrote:
Internally the project uses 'char' instead of 'wchar_t' for manipulating strings. So that means no Unicode support?
Reply to this email directly or view it on GitHub: https://github.com/hcatlin/libsass/issues/25
There's no need to change to wchar_t
if you pick a nice encoding like UTF-8 and stick with it (which I recommend). GLib uses char
everywhere and only when you specifically need to deal with some Unicode peculiarities do you need to use any other functions.
Thanks, @QuLogic, I would agree with you. But the library incorrect loads (only ANSII and UTF-8 without BOM encodings) and saves with encoding. It is still the problem.
I just tried a file with actual UTF-8 characters and it works just fine. But it doesn't have a BOM because it's not necessary in UTF-8. I think all that needs to be done is to ignore the BOM (assuming the code's going to work with UTF-8 only, that is).
I think the problem is not just in encoding. If library interface takes a UTF-8 encoded char*
, I'll convert to it from any encoding.
But, if my code imports some file (via @import
derective), I can't specify the encoding of that file, and the libsass always interpretate it as UTF-8 without BOM.
In some cases, I can't control of the encoding of the files. But I can know what encoding is in every file (by detection algorithms, user settings or transport-specifiec information, in by ex. HTTP Content-Type).
I think the best solution for this is introduce ability to provide some interface, that would be able to resolve file paths and file contents.
Something like this:
class SourceContext
{
public:
virtual std::string get_content() = 0;
virtual std::shared_ptr<SourceContext> resolve_path(std::string path) = 0;
};
Ah, if only everyone just used UTF-8. But yes, I forgot about the @import
issue.
I think if libsass says "I assume UTF-8 everywhere", then we fix up issue #21 nicely (in some way similar to what you propose), libsass could just have the application/bindings deal with the encoding.
Guys, have you decided what do on this? As already mentioned, it breaks with files that include a BOM.
I used to do C++ before I discovered C# and wchar_t was a big pain to use back then because it meant different OS support (I think Unicode was supported after Windows 2000/XP) and you have to use compiler flags to produce ANSI and Unicode versions specifically.
In this day and age when all OSes support Unicode, changing wchar_t should be trivial. The only "problem" is that the memory usage will double automatically - though this is not really a big problem given the size of computer memory and SCSS files.
Let me know what you think.
LibSass will currently read a BOM if present and reject any files that aren't UTF-8.
Aside from that, since we'd like to avoid external dependencies if possible, I'll look into using wchar_t then.
Seriously?? Then it must be because this guy hasn't rebased in 3 months. Anyhow, this was my fix: https://github.com/TBAPI-0KA/NSass/pull/1
If you get stuck anywhere with wchar_t, feel free to comment here and I'll try to help
... should be trivial ...
Famous last words.
LOL. Fair comment but at I least I went for "should" vs "will" :)
LibSass is only going to support UTF-8 for the forseeable future. This support is mostly implemented, and there are tickets for the little edge-cases on which it fails.
Internally the project uses 'char' instead of 'wchar_t' for manipulating strings. So that means no Unicode support?