w3c / i18n-checker

W3C's i18n checker
https://validator.w3.org/i18n-checker/
Other
36 stars 17 forks source link

Add language detection #2

Open r12a opened 8 years ago

r12a commented 8 years ago

Mike has recently added automatic language detection for validator.nu, which allows him to then check whether the lang attribute in the html tag is correct, and also to suggest use of the dir attribute in the html tag where appropriate.

This is something i've been wanting to add to the i18n checker for some years, after Microsoft gave me access to the Bing API so that it could also do that. Unfortunately, i haven't had time to do anything further to the i18n checker in all that time due to work pressure. I'm not sure whether i'm still able to access the Microsoft API.

On the other hand, Mike found some libraries that can be linked to PHP code.

12:17 Mike5: I almost forgot 12:18 Mike5: I did some hunting for PHP libs 12:18 Mike5: gimme a minute to get you some links 12:18 Mike5: https://github.com/lstrojny/php-cld 12:18 Mike5: https://github.com/fntlnz/cld2-php-ext 12:19 Mike5: bindings to Google’s C++ Compact Language Detector library 12:19 Mike5: which is at https://github.com/CLD2Owners/cld2 12:19 Mike5: has support for 83 languages 12:20 Mike5: that is a very good library I think 12:20 Mike5: probably more accurate than the one I am using but not sure

r12a commented 8 years ago

an alternative is to get the language information from validator.nu, rather than reinvent the language detection wheel in PHP. Mike thinks it should be fairly trivial to implement a call to the html5 validator and parse the returning info for the language.

in the simplest case if you have a URL for doc, you can just do this: https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.w3.org&out=json if you look at the very end of that JSON you will see "language":"en"