luciopaiva / witchcraft

Inject Javascript and CSS right from your file system. Think GreaseMonkey for more advanced users.
https://luciopaiva.com/witchcraft
MIT License
262 stars 19 forks source link

Question: Disable / unload scripts from running -- str.replace HTML before page is loaded ? #42

Open steinhaug opened 3 years ago

steinhaug commented 3 years ago

I have 2 questions so here goes:

Question 1:

I am wondering what the best practise would be if I were to remove a certain script from running on the page. Take this markup as an example:

Example markup for example domain: theurl.com

<html><head>
  <script src="someurl.com" id="s1">
</head><body>
  <script id="s2">console.log('inline javascript');</script>
</body></html>`

How would I go about writing the theurl.com.js file for removing / disabling the scripts correctly? I did try something like:

theurl.com.js

// attempt 1, directly
document.getElementById('s1').remove();
document.getElementById('s2').remove();

// attempt 2 , after window load
window.addEventListener("load", () => {
    document.getElementById('s1').remove();
    document.getElementById('s2').remove();
});

However it seems that this method isn't working very good as the page freezes up after removing scripts, so I am hoping there is another way of doing this. Any tips would be great!

Question 2:

The other alternative would be if Witchcraft was able to work on the markup before it's delivered to the browser - intercept it in the middle and do some str.replace on the markup. That would be awesome if it were possible, I have done this with http-proxy-middleware in node in a project.

Is it possible for Witchcraft to let me manipulate the HTML before it's loaded by the browser ? If that makes sence. This way I could use regex and string replaces on the markup before the page was loaded - instead of doing itwith javascript.

luciopaiva commented 3 years ago

Hi @steinhaug,

Question 1 is a really good one, but which I don't know the answer to. I just tried a simple script where I removed all script tags from my home page and it did work in the sense that the HTML tags were removed, but the scripts were executed anyway. I don't know why is that so. Witchcraft scripts are loaded at document_start and the Chrome extension docs tell that document_start happens before any other script is run, but it could very well be that Chrome already knows about them and will execute them right after no matter what you do. I'd really like to know the answer to this question as well, so please let me know if you find something!

About question 2, the answer is no, Witchcraft can't interfere with the original HTML before Chrome receives it. That's a limitation imposed by the Chrome extension framework. If you really must prevent a script from running, considering we can't find a solution to question 1, a Node.js proxy seems like an interesting option indeed.

An alternative approach would be to counteract anything that the script is doing after it has finished doing it. An easy example would be a script that loads an HTML tag in the page. We'd just let it load the tag and we'd then remove it. Of course, that would not be an option if we want to prevent the script from doing an ajax call, for instance.

Really good questions. Sorry I took so long to answer, but let me know if you have any updates.

Thanks!

mjuksel commented 3 years ago

Hmm, I thought I'd chime in here too :) I found out that you can execute stop, with either of these:

// should work
stop();
// or an iife
(function() {
    return stop();
})();
// short version
(()=>stop())();

On simple pages it does seem that the page will load regardless of this, but on a bit heavier data driven pages it will stop everyhting from executing.

//edit: interestingly, the page will not even load the <body> tag when I've used this one:

(doc=>doc&&stop())(document);

then you could do

(doc=>doc&&stop())(document);
(async () => {
    const Request = await fetch(location.href);
    const Response = await Request.text();
    /* this will create a new document obj to use instead of parsing the page with regex ;) */
    const newDoc = (()=>new DOMParser().parseFromString(Response, 'text/html'));
    const Body = newDoc.body;
    Body.querySelector(blablaYouGetItIGuess);
})();

Let me know what you guys find out !