inveniosoftware / invenio

Invenio digital library framework
https://invenio.readthedocs.io
MIT License
625 stars 292 forks source link

WebSearch: detect usage of Google Translate on restricted records and warn user #1022

Closed jeromecaffaro closed 9 years ago

jeromecaffaro commented 10 years ago

Originally on 2012-04-24

Google Translate is a popular browser extension (or native feature) that can leak confidential information to an external service: the content of the page is sent by the browser/extension to the translation service and is returned translated. It would be useful to detect when it is used on restricted records, and warn users.

Ideally the implementation would prevent sending the confidential information to the external service. It does not seem possible to do so (the use of <meta name="google" value="notranslate"> would apparently still send the page) so that in the end it has to be the responsibility of the user to not disclose such information. The system can help by making the user aware of the "leak" (in the same way as the "Restricted" flag does not prevent screenshots, etc.).

One way to detect the use of Google Translate on a page is the addition of the global window.google variable, as discussed in StackOverflow: Detecting Google Chrome Translation

document.addEventListener('DOMSubtreeModified', function (e) {
    if(e.target.tagName === 'HTML' && window.google) {
        if(e.target.className.match('translated')) {
            // page has been translated
         } else {
            // page has been translated and translation was canceled
        }
   }
}, true);

It should be investigated if this solution appears to be valid in the long term, and if there is a way to not impact too much the display speed on the user side (the above looks for any change of the DOM tree).

If the use of Google Translate is detected, a transient popover (balloon) pointing to the top of the window (similarly to such popovers suggesting to add website to homescreen on iOS) with such a message:

You seem to be using Google Translate. For this service to work the content 
of this restricted page had to be sent to an external company, which can be 
a violation of conditions of use of %(CFG_SITE_NAME_INTL)s.
It is your responsibility to not send restricted content to unauthorized 
persons and services.

(Google Translate should maybe be referred to as "Google Translate™ translation service", or be anonymized as a "translation service", or simply "service" in order to match later other similar services)

The check should only be run on restricted content: detailed pages of restricted records, and possibly browsing/searching restricted collections.

Other browser features/extensions can also leak information to external services: Google Translate would only appear to be one of the most popular. Other extensions/services such as Pocket (Formerly Read It Later), Google Reader (when restricted RSS becomes accessible, with API keys) could also be investigated (if disabling/detecting is possible, etc.)

jirikuncar commented 9 years ago

Is it still desired for @CERNDocumentServer (cc @drjova @egabancho)?

tiborsimko commented 9 years ago

IIRC this was addressed over the summer? Is there a PR?

egabancho commented 9 years ago

Yes! It is merged in our overlay https://github.com/CERNDocumentServer/cds/commit/19482a8255ab8775095527044eb8ee28043adb88

Should we make also an Invenio version?

tiborsimko commented 9 years ago

Either please prepare a PR, or else let the feature live in the CDS overlay only and let's close this issue here.