sublimehq / sublime_text

Issue tracker for Sublime Text
https://www.sublimetext.com
802 stars 38 forks source link

Multiple dictionaries per file #1069

Closed forthrin closed 2 years ago

forthrin commented 8 years ago

I use Sublime Text to write fiction in which the same text file may contain several languages, especially in lines spoken by characters of different nationalities, for example Norwegian, English and Japanese.

When using spell checking, it seems I must settle on a single language, which means text in other languages will be marked as "incorrectly spelled" and marked in red. Annoying and confusing.

What would be the best way to overcome this? Can support be added for multiple dictionaries per file, for example?

If it makes a difference, I do most of this writing in the Fountainhead package, if that makes it easier to come up with a solution (such as handling spoken lines in a particular way.)

titoBouzout commented 8 years ago

The issue with this is that there is no API to spell-check, but basically everything else (selecting lines, highlighting, changing words) is possible.

oliva commented 8 years ago

The same happens for me when programming or writing translation files where the variables are in English and the strings are in a different language.

evandrocoan commented 7 years ago

As I write both in English and Portuguese, I combined the English Dictionary with the Portuguese dictionary. So, now I got spell checking on both languages. You may find this dictionary here:

  1. https://github.com/evandrocoan/SublimeTextStudio/tree/develop/MultiLingual%20Dictionary
Dxhs commented 7 years ago

@evandrocoan I just did this with my Norwegian and English. 322044 lines + 470122. Took me 5 minutes to c+p it with kate on linux. Lol

titoBouzout commented 7 years ago

Can you please briefly but precisely describe how do you that? Im interested Thanks!

evandrocoan commented 7 years ago

Steps to merge 2 dictionaries files

  1. Download new dictionaries from: https://github.com/titoBouzout/Dictionaries
  2. Duplicate the file EN_US.txt as EN_US_MY_LANG.txt
  3. Duplicate the file EN_US.aff as EN_US_MY_LANG.aff
  4. Duplicate the file EN_US.dic as EN_US_MY_LANG.dic
  5. Open your MY_LANG.txt and append its contents on EN_US_MY_LANG.txt.
  6. Open your MY_LANG.aff and merge its contents on EN_US_MY_LANG.aff using your intelligence.
  7. Open your MY_LANG.dic and append its contents on EN_US_MY_LANG.dic and update the EN_US_MY_LANG.dic first file line with the correct number of words on this new file.
  8. Set EN_US_MY_LANG as your default spelling check language.

Now you got spelling on 2 languages. But there are some downsides:

  1. When merging the files .aff you need to take care on how you do it otherwise it may crash Sublime Text.
  2. The misspelling suggestions will not be accurate most times, as you now got distinct languages bond by the same misspelling/spelling prediction rules.
Dxhs commented 7 years ago

@titoBouzout In linux, I used the cat command with shell redirection (>) into my output file: english.dic norwegian.dic > output.dic. If use my standard text editor, it crashes. Lshell did it in milliseconds. I suggest using Lshell or Cmd.

titoBouzout commented 7 years ago

Thanks for the explanation! Ill try :)

gustavobittencourt commented 7 years ago

Thank you, @evandrocoan!

I'll try to install your EN_PT dictionary here!

Kristinita commented 7 years ago

I can not to merge Russian and English dictionaries, I get bad results. See part of answer of hunspell contributor:

Bilingual spellchecker is not supported, at least not in reliable way. Merging dictionaries should be out of question.

Thanks.

evandrocoan commented 7 years ago

@dimztimz is correct on this:

Instead, at API level you can instantiate multiple objects of the spellchecker with different languages. Then you can check the word in each object. This is the most reliable way for now.

Therefore as it is to be performed by the Sublime Text spell checking core. So we need to wait for them to implement this feature for the best functionality.

Now I got good results with some disadvantages merging the EN _ PT dictionaries. However these two languages are pretty similar. For English and Russian, should not be easy to merge them, if it is possible.

Kristinita commented 7 years ago

More programs use Hunspell as Sublime Text. Can try to find extension for another program. I try use in Sublime Text Firefox Russian-English Bilingual addon, and it successfully worked for me.

Kristina

But for single language spellchecking package LanguageTool — the best solution with many nice features.

Thanks.

ghost commented 7 years ago

And I thought I had it with

{
    "dictionary":
    [
        "Packages/Language - English/en_US.dic",
        "Packages/Language - Other/Portuguese (European).dic"
    ]
}

Alas, no.

BenjaminSchaaf commented 2 years ago

Fixed in build 4123. "dictionary" can now be provided a list.

eugenesvk commented 2 years ago

Fixed in build 4123. "dictionary" can now be provided a list.

This doesn't work very reliably, please see a quick check of different dictionary combos taken from here below and note how the en_US.dic is a spoiler , though only for the Russian one :) (changing the order of dictionaries doesn't seem to matter)

This is my complete settings file in a new portable Sublime on Windows, scroll horizontally to see the results for different dictionary combos

{
"ignored_packages":["Vintage",],
"spell_check": true,
"dictionary": [
"Packages/Language - English/en_US.dic",       // 1
"Packages/Language - English/en_GB.dic",       // 2
"Packages/User/Dictionaries/German_de_DE.dic", // 3
"Packages/User/Dictionaries/Russian.dic",      // 4
]
}
// Ln Text                                  1US 2GB 3DE 4Ru 1+2 1+3 1+4 2+3 2+4 3+4 1+3+4   2+3+4   1+2+3+4
// US "Is htis in colors?  That's insane!"  +   +   *   *   +   +   +   +   +   *   +       +       +
// GB "Is htis in colours? That's insane!"  +   +   *   *   +   +   +   +   +   *   +       +       +
// De "Rechtschreibe/stylistsische Fehler"  *   *   +   *   *   +   *   +   *   +   +       +       +
// Ru "Превед, медвед, как дела"            -   *   *   +   -   -   -   *   +   +   -       +       -

// + highlights spellchecking errors in the Row's language (from the perspective of the Column's dictionary/ies)
// * same as +, but for a mismatching language (~the whole line is highlighted)
// - nothing is highlighted, even spellchecking errors
BenjaminSchaaf commented 2 years ago

It's working fine here with the same dictionaries:

Screenshot from 2021-12-10 16-06-14

eugenesvk commented 2 years ago

And by "working fine" you mean that it's not highlighting any spelling mistakes in the Russian line? All the example lines contain spelling errors, so no line should ever be free from red no matter the dictionary/ies

eugenesvk commented 2 years ago

I think it's due to the first wrong line in the dictionary affix file SET ISO8859-1, it should be UTF8. Not sure if the dictionary has to be regenerated, it doesn't seem to be saved in UTF8, so likely yes, though a simple text replace seems to be fine and seems to fix the surface issue of no highlights in the Ru line (otherwise haven't done any tests re. how well any of these combos work)

Out of curiosity, are you combining the language files behind the scenes (having to deal with different affix schemes) or are you using a simpler API and send each text to each dic and then combine the results it somehow?

BenjaminSchaaf commented 2 years ago

Ah yes, the en_US dictionary seems to be wrongly configured. It's unrelated to this issue, but I'll put in a fix.

Out of curiosity, are you combining the language files behind the scenes (having to deal with different affix schemes) or are you using a simpler API and send each text to each dic and then combine the results it somehow?

There's no way to combine the languages (easily), so we just check if a sub-word is spelt correctly according to any of the listed dictionaries.

BenjaminSchaaf commented 2 years ago

Upon further inspection we just seem to be handling the encoding wrong.

MarllonMenezes commented 2 years ago

Como escrevo tanto em inglês quanto em português, combinei o Dicionário de inglês com o dicionário de português. Então, agora eu tenho verificação ortográfica em ambos os idiomas. Você pode encontrar este dicionário aqui:

  1. https://github.com/evandrocoan/SublimeTextStudio/tree/develop/MultiLingual%20Dictionary

você ainda tem esse Dicionário ?

evandrocoan commented 2 years ago

Movi para este link: https://github.com/evandrocoan/LanguageEnglishAndPortuguese

stdedos commented 2 years ago

Is this consider to be solved? Because I still have problems with it.

I have also tried with a UTF8 dic, no luck 😕

BenjaminSchaaf commented 2 years ago

@stdedos the encoding problem was fixed in build 4125, so yes this should all be working.

stdedos commented 2 years ago

I am using https://github.com/titoBouzout/Dictionaries/blob/master/Greek.dic with

    "dictionary": [
        "Packages/Language - English/en_US.dic", // 1
        "Packages/Language - English/en_GB.dic", // 2
        "Packages/Greek.dic",                    // 3
        "Packages/Greek_UTF8.dic",               // 3
        "Packages/User/Greek.dic",               // 3
    ],

and text

What is Lorem Ipsum?
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

Γιατί το χρησιμοποιούμε;
Είναι πλέον κοινά παραδεκτό ότι ένας αναγνώστης αποσπάται από το περιεχόμενο που διαβάζει, όταν εξετάζει τη διαμόρφωση μίας σελίδας. Η ουσία της χρήσης του Lorem Ipsum είναι ότι έχει λίγο-πολύ μία ομαλή κατανομή γραμμάτων, αντίθετα με το να βάλει κανείς κείμενο όπως 'Εδώ θα μπει κείμενο, εδώ θα μπει κείμενο', κάνοντάς το να φαίνεται σαν κανονικό κείμενο. Πολλά λογισμικά πακέτα ηλεκτρονικής σελιδοποίησης και επεξεργαστές ιστότοπων πλέον χρησιμοποιούν το Lorem Ipsum σαν προκαθορισμένο δείγμα κειμένου, και η αναζήτησ για τις λέξεις 'lorem ipsum' στο διαδίκτυο θα αποκαλύψει πολλά web site που βρίσκονται στο στάδιο της δημιουργίας. Διάφορες εκδοχές έχουν προκύψει με το πέρασμα των χρόνων, άλλες φορές κατά λάθος, άλλες φορές σκόπιμα (με σκοπό το χιούμορ και άλλα συναφή).

Where does it come from?
Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.

The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested. Sections 1.10.32 and 1.10.33 from "de Finibus Bonorum et Malorum" by Cicero are also reproduced in their exact original form, accompanied by English versions from the 1914 translation by H. Rackham.

Που μπορώ να βρω μερικές;
Υπάρχουν πολλές εκδοχές των αποσπασμάτων του Lorem Ipsum διαθέσιμες, αλλά η πλειοψηφία τους έχει δεχθεί κάποιας μορφής αλλοιώσεις, με ενσωματωμένους αστεεισμούς, ή τυχαίες λέξεις που δεν γίνονται καν πιστευτές. Εάν πρόκειται να χρησιμοποιήσετε ένα κομμάτι του Lorem Ipsum, πρέπει να είστε βέβαιοι πως δεν βρίσκεται κάτι προσβλητικό κρυμμένο μέσα στο κείμενο. Όλες οι γεννήτριες Lorem Ipsum στο διαδίκτυο τείνουν να επαναλαμβάνουν προκαθορισμένα κομμάτια του Lorem Ipsum κατά απαίτηση, καθιστώνας την παρούσα γεννήτρια την πρώτη πραγματική γεννήτρια στο διαδίκτυο. Χρησιμοποιεί ένα λεξικό με πάνω από 200 λατινικές λέξεις, συνδυασμένες με ένα εύχρηστο μοντέλο σύνταξης προτάσεων, ώστε να παράγει Lorem Ipsum που δείχνει λογικό. Από εκεί και πέρα, το Lorem Ipsum παραμένει πάντα ανοιχτό σε επαναλήψεις, ενσωμάτωση χιούμορ, μη κατανοητές λέξεις κλπ.

half lights up like a Christmas tree.

BenjaminSchaaf commented 2 years ago

Screenshot from 2022-05-26 16-05-46

It's working fine here with the linked dictionary. I suggest double checking you've correctly installed that dictionary - both the aff and dic files are required and must not have their encodings modified.

ghost commented 1 year ago

Hi @BenjaminSchaaf

Can you tell me what is wrong with the Ukrainian dictionary?

When I add to the dictionary list, English one stops working.

{
    "dictionary": [
        "Packages/Language - English/en_US.dic",
        "Packages/User/Dictionaries/uk_UA.dic",
    ],
}

Thanks!

Sublime Text v4134 Windows 10

uk_UA.zip

BenjaminSchaaf commented 1 year ago

@ihor-oleks I suggest making a separate issue, thanks.

ghost commented 1 year ago

Done, thanks.