sophie-glk / bang

A firefox addon which adds bangs (known from duckduckgo) to various search engines.
MIT License
46 stars 5 forks source link

[Bug] Google.com bang uses Deutsch language by default #10

Closed robinvvinod closed 4 years ago

robinvvinod commented 4 years ago

Using the !g bang includes hl=de in the search term which causes google to display the results in German.

The output link for using the query !g test is https://www.google.com/search?hl=de&q=test

sophie-glk commented 4 years ago

This is caused by the way I get the local bang list. (Explained here: https://github.com/sophie-glk/bang/issues/6). I didn't think about this, but lots of websites add location / ip based url parameters. I wrote a quick script that should remove all url parameters except the search parameter from the local bang file (bangs.json).

let fs = require('fs');
let banglist = JSON.parse(fs.readFileSync('bangs.json').toString());
for (let i = 0; i < banglist.length; i++) {
let url = new URL(banglist[i][1]);
let search_params = new URLSearchParams(url.search); 
let return_param = new URLSearchParams(url.search); 
for(let j of search_params.entries()) {
    if(j[1] == "bang"){
        continue;
    }
   return_param.delete(j[0]);

}
url.search = return_param.toString();
let url_s = url.toString();
console.log(url_s);
banglist[i][1] = url_s;
}
   fs.writeFile("bangs_cleaned.json"  , JSON.stringify(banglist), function(err) {
    if(err) {
        return console.log(err);
    }
}); 

I will update the bangs.json file tomorrow. (I don't have time right now, to check if this actually works, maybe you could test if it works and let me know?)

robinvvinod commented 4 years ago

Excluding all the parameters except the search parameter causes issues with some bangs, especially the ones which are just a site-specific search in duckduckgo. For example the !fgentoo bang becomes empty because it contains no standard search parameters:

[
    "!fgentoo",
    "https://duckduckgo.com/?q=site%3Aforums.gentoo.org+bang&ia=web"
],
[
    "!fgentoo",
    "https://duckduckgo.com/"
 ],

This is also present in bangs that require a user login:

[
    "!gma",
    "https://accounts.google.com/ServiceLogin?service=mail&passive=true&rm=false&continue=https://mail.google.com/mail/&ss=1&scc=1&ltmpl=googlemail&emr=1&osid=1#"
 ],
[
    "!gma",
    "https://accounts.google.com/ServiceLogin#"
 ],
sophie-glk commented 4 years ago

Maybe we could try an approach that only removes url paramaters that equal "de"? That should hopefully fix the issue on most sites.

robinvvinod commented 4 years ago

I don't think that would work particularly well considering there are bangs like !ytde that requires the "de" parameter because it is specifically made for youtube in German.

Since you are getting the bangs.json file using puppeteer as mentioned in #6, is the "de" flag coming from your local settings on your browser and based on your IP address?

I'm not exactly sure how puppeteer works but perhaps you could consider clearing all cookies and cache from your browser, setting the default language to English and use a VPN connected to America/UK to get "en" by default unless specifically directed to another language by the bang?

sophie-glk commented 4 years ago

Puppeteer seems to use the system language as a default. When I send a different Accept-language header:

 await page.setExtraHTTPHeaders({
    'Accept-Language': 'en-US'
});

I get the english url (https://www.google.com/search?hl=en&q=bang). This doesn't really fix the issue though. I for example would like to use google in german. (But its probably a better solution as most people who use this addon probably speak english) But maybe we could use a language code of some other language from here: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes and remove all url paramaters that have it as a value?

robinvvinod commented 4 years ago

I think the current behaviour is similar to what DDG does by default. Unless a bang specific to the language is used (see image), English is the default and the result does not follow system language choice.

List of Google bangs used in DDG: List of bangs

For example, !g would always be in English regardless of system language and !gde will give German

Perhaps this is the best solution, following what DDG does and leaving the user to choose a different bang if they want a different language.

sophie-glk commented 4 years ago

That doesn't make any sense, if that were the case why would it matter which Accept-Language header is sent? The fact that it just redirects you to google.com and google decides which language to use, is the cause of this issue in the first place.

sophie-glk commented 4 years ago

I think I have found the perfect solution now, there is a much easier way to get the list of bangs! Duckduckgo basically provides them for us in this file: https://duckduckgo.com/bang.v253.js

var fs = require('fs');
var result =[];
var bangs = JSON.parse(fs.readFileSync('bang.v253.js'));
console.log(bangs);
for(var i = 0; i < bangs.length; i++){
    var bang_site = bangs[i].u.replace("{{{s}}}", "bang");
    var bang_prefix = "!" + bangs[i].t;
    result.push([bang_prefix, bang_site]);
}
fs.writeFile("output.json"  , JSON.stringify(result), function(err) {
    if(err) {
        return console.log(err);
    }});
console.log(result);
robinvvinod commented 4 years ago

That's great, Duckduckgo have made it very convenient.

Cheers!