sass / libsass

A C/C++ implementation of a Sass compiler
https://sass-lang.com/libsass
Other
4.34k stars 463 forks source link

Unicode support #25

Closed Igorbek closed 10 years ago

Igorbek commented 12 years ago

Internally the project uses 'char' instead of 'wchar_t' for manipulating strings. So that means no Unicode support?

akhleung commented 12 years ago

Not currently; I didn't have time to look into it when we began the project. Is it as simple as using wchar_t? Will I need to rely on a 3rd party library for checking character classes and such?

Igorbek commented 12 years ago

It is not require any 3rd party libraries to manipulate wchar_t. It is simple as using char. All functions for char is also available for wchar_t.

HamptonMakes commented 12 years ago

I believe we intend it. We are noobs.

On Mon, Jul 9, 2012 at 3:50 PM, Igorbek < reply@reply.github.com

wrote:

Internally the project uses 'char' instead of 'wchar_t' for manipulating strings. So that means no Unicode support?


Reply to this email directly or view it on GitHub: https://github.com/hcatlin/libsass/issues/25

QuLogic commented 12 years ago

There's no need to change to wchar_t if you pick a nice encoding like UTF-8 and stick with it (which I recommend). GLib uses char everywhere and only when you specifically need to deal with some Unicode peculiarities do you need to use any other functions.

Igorbek commented 12 years ago

Thanks, @QuLogic, I would agree with you. But the library incorrect loads (only ANSII and UTF-8 without BOM encodings) and saves with encoding. It is still the problem.

QuLogic commented 12 years ago

I just tried a file with actual UTF-8 characters and it works just fine. But it doesn't have a BOM because it's not necessary in UTF-8. I think all that needs to be done is to ignore the BOM (assuming the code's going to work with UTF-8 only, that is).

Igorbek commented 12 years ago

I think the problem is not just in encoding. If library interface takes a UTF-8 encoded char*, I'll convert to it from any encoding. But, if my code imports some file (via @import derective), I can't specify the encoding of that file, and the libsass always interpretate it as UTF-8 without BOM. In some cases, I can't control of the encoding of the files. But I can know what encoding is in every file (by detection algorithms, user settings or transport-specifiec information, in by ex. HTTP Content-Type). I think the best solution for this is introduce ability to provide some interface, that would be able to resolve file paths and file contents. Something like this:

class SourceContext
{
public:
    virtual std::string get_content() = 0;
    virtual std::shared_ptr<SourceContext> resolve_path(std::string path) = 0;
};
QuLogic commented 12 years ago

Ah, if only everyone just used UTF-8. But yes, I forgot about the @import issue.

I think if libsass says "I assume UTF-8 everywhere", then we fix up issue #21 nicely (in some way similar to what you propose), libsass could just have the application/bindings deal with the encoding.

georgiosd commented 11 years ago

Guys, have you decided what do on this? As already mentioned, it breaks with files that include a BOM.

I used to do C++ before I discovered C# and wchar_t was a big pain to use back then because it meant different OS support (I think Unicode was supported after Windows 2000/XP) and you have to use compiler flags to produce ANSI and Unicode versions specifically.

In this day and age when all OSes support Unicode, changing wchar_t should be trivial. The only "problem" is that the memory usage will double automatically - though this is not really a big problem given the size of computer memory and SCSS files.

Let me know what you think.

akhleung commented 11 years ago

LibSass will currently read a BOM if present and reject any files that aren't UTF-8.

Aside from that, since we'd like to avoid external dependencies if possible, I'll look into using wchar_t then.

georgiosd commented 11 years ago

Seriously?? Then it must be because this guy hasn't rebased in 3 months. Anyhow, this was my fix: https://github.com/TBAPI-0KA/NSass/pull/1

georgiosd commented 11 years ago

If you get stuck anywhere with wchar_t, feel free to comment here and I'll try to help

craigbarnes commented 11 years ago

... should be trivial ...

Famous last words.

georgiosd commented 11 years ago

LOL. Fair comment but at I least I went for "should" vs "will" :)

akhleung commented 10 years ago

LibSass is only going to support UTF-8 for the forseeable future. This support is mostly implemented, and there are tickets for the little edge-cases on which it fails.