tattle-made / Uli

Software and Resources for Mitigating Online Gender Based Violence in India
https://uli.tattle.co.in
GNU General Public License v3.0
40 stars 30 forks source link

Review DOM diffing/patching techniques #583

Closed aatmanvaidya closed 2 months ago

aatmanvaidya commented 3 months ago

Choose any webpage and save it's html in a file. Try to pick a webpage that has a good mix of different html elements.

Let's try to work on DOM parsing and creating data structures that are suited for our use case. Remember, we want to optimise for following functions

  1. get text content in dom and process it
  2. Be able to access a text element in O(1) time and add/remove html element next to it

Review the source code of a commonly used dom diffing library - morphdom https://github.com/patrick-steele-idem/morphdom Also the TreeWalker (https://developer.mozilla.org/en-US/docs/Web/API/TreeWalker), which is natively available in browser as well. These two will help you understand common approaches to dom parsing and refresh any data structure or algorithm knowledge you will need for this task. As your deliverable, try writing an implementation, however naive and inefficient. We will use that as a starting point for critiquing and refining

aatmanvaidya commented 3 months ago

assigning this issue to @hardik-pratap-singh

hardik-pratap-singh commented 3 months ago

I want to work on this issue. Please assign this to me !

aatmanvaidya commented 3 months ago
hardik-pratap-singh commented 3 months ago

Fix for 2nd checkpoint : PR #596 Related code can be found in directory - browser-extension/c4gt_testing_code/week3 Can be tested by running index.html present inside week3 directory

aatmanvaidya commented 3 months ago
aatmanvaidya commented 3 months ago
hardik-pratap-singh commented 2 months ago

Approach for finding empty/false text nodes :

Need of doing it ?

They are giving some unwanted results like injecting icons across same slur word multiple times and also if we remove these text nodes we will not have to check for slurWords inside them later. If we have minified version of HTML then there are no false textNodes in it, but when HTML is not minified, browser adds an empty textNode before and after each elements, this increases the count of textNodes drastically and in our data structure all such nodes get unnecessarily appended. Since we are checking for slurWords for each of these textNodes so it is impossible that a slurWord exist in such an empty text Node.

Approach :

Empty text nodes will look something like this : \n, \n\n, etc. with any number of whitespace characters attached to them So what one thing that we can do is we can count the number of newline and whitespace characters in the text and store them in some cnt variable


if (textNode.length === cnt){
    //we can avoid pushing such textNodes in our data structure 
}
else{
    //if it has some characters then we can add them to our data structure
}
hardik-pratap-singh commented 2 months ago

Implemented this function following the approach described above

image

PS : Results of Adding this function

Earlier, for the same HTML page, we were getting total 31 responses as textNodes (including false textNodes)

image

But after removing false textNodes, we are getting 8 responses as textNodes, (now we need to check for slurWords in these 8 textNodes instead of checking in those 31 textNodes including empty textNodes as well)

image

dennyabrain commented 2 months ago

Pre Requisite Knowledge

Deliverables

  1. Write a function indexTree that given a Node or Element, traverses it and returns a uliStore object with the following sample structure
    uliStore = {
    "random_id_1" : {
        element: Node || Element
    },
    "random_id_2" : {
        element: Node || Element
    }
    }
  2. Write a function locateSlur that goes through each item in uliStore and detects the presence of slurs in it and adds the following fields to uliStore
    • slur text
    • slur location within the text
      uliStore = {
      "random_id_1" : {
      element: Node || Element,
      slurText : "crazy",
      slurLocation: [4,10]
      },
      }
  3. Run the indexTree and locateSlur function on the following websites and share the resulting uliStore with us.
dennyabrain commented 2 months ago

From our call, we know that @hardik-pratap-singh has some version of indexTree, so the revised deliverables for this week are :

  1. Modify hardik's function getAllTextNodes to return a simplified datastructure uliStore. Right now it stores html node elements and thats very large.
  2. decouple finding slurs and injecting svgs. We'd like to be able to run a locateSlur function that can return the slur location indices as opposed to directly finding slurs and injecting svgs.
dennyabrain commented 2 months ago

UI Idea for later

One of the reasons to decouple finding slur location with actual svg injection is that I'd like to experiment with different type of actions on the slur. So svg icon is one way to indicate that a slur has been found and you can hover on it. But we can also try something more subtle, like this Frame 5 The slur word would just be mildly highlighted and not take away from your reading experience.

hardik-pratap-singh commented 2 months ago

Confirmation on the Design of Data Structure : uliStore

After passing through indexTree()/getAllTextNodes() function, uliStore will look like this :

uliStore = [
    {
      node : text ,                                 //this is the text node
      parent : parentNode                           //this is the parent of the text node
    },
    {
       node : text,
       parent : parentNode
    },
    {

    },
    .
    .
    .
]

Now after this, when uliStore is processed within locateSlur() function, uliStore will look like something like this :

uliStore = [
    {
        node: node,                         
        parentNode: parentNode,          
        slurs: [                                              
            {
                slurText : 'crazy',
                slurLocation : [[4 , 8] , [12 , 16]]
            },
            {
                slurText : 'stupid',
                slurLocation : [[19 , 24] , [34 , 36] , [67 , 76]]
            },
            {
                slurText : 'mad',
                slurLocation : [[1 , 3] , [21 , 25]]
            },
            {

            },
            {

            }
        ]
    }, 
    {
        node: node,                         
        parentNode: parentNode,          
        slurs: [                                              
            {
                slurText : 'crazy',
                slurLocation : [[3 , 7] , [11 , 16]]
            },
            {
                slurText : 'mad',
                slurLocation : [[12 , 23] , [71 , 75]]
            },
            {

            },
            {

            }
        ]
    },
    {

    }

]

We have to append slurs to each of the uliStore object which is going to store the information about all the slurs present in the text node stored in that particular uliStore.

@dennyabrain @aatmanvaidya Please review it and let me know if I am going in right direction ?

dennyabrain commented 2 months ago

@hardik-pratap-singh looks good to me. You can proceed.

hardik-pratap-singh commented 2 months ago

Update on uliStore

Results when DOM is passed through getAllTextNodes() and locateSlur()

@dennyabrain @aatmanvaidya Please review !!

dennyabrain commented 2 months ago

@aatmanvaidya this looks good to me. Should we start moving the getAllTextNodes and locateSlur functions into the source code? While the implementation can be finetuned later, these two functions are very foundational and can be reused and repurposed. So figure out where to put them and how to organize it, create a new PR with just those two functions added and merge it.

aatmanvaidya commented 2 months ago

@dennyabrain sounds good to me, only one thing, since we have a release coming up, shall we merge this after the release. I can finish the work on release by early next week (Mon/Tue). Will try to get it done by monday

dennyabrain commented 2 months ago

We don't have to think of this as a blocker since we will only be adding the functions, we won't be adding any code that uses these functions. So it should not be a problem. I'll let you take a final call though.

aatmanvaidya commented 2 months ago

got it, let's merge the function's then, I was thinking for now, we add the functions to the plugin/src/transform-general.js files only - https://github.com/tattle-made/Uli/blob/main/browser-extension/plugin/src/transform-general.js

going ahead, if things get too clogged up here, or we feel that this could be a new module, then we can move things to a new file, does that work?

dennyabrain commented 2 months ago

Yeah sounds good to me.

hardik-pratap-singh commented 2 months ago

Hello @dennyabrain @aatmanvaidya,

As per the discussions, I am adding the functions getAllTextNodes() and locateSlur() inside plugin/src/transform-general.js.

However, I have commented the function calls and log statements for now.

aatmanvaidya commented 2 months ago

Hi @hardik-pratap-singh have reviewed and added few comments on the PR

hardik-pratap-singh commented 2 months ago

Hello @aatmanvaidya, I can't find your comments on the PR

image

aatmanvaidya commented 2 months ago

hi @hardik-pratap-singh , just above this you will find reviews added by me, something like this image

by comment I meant, reviews added to the code

hardik-pratap-singh commented 2 months ago

@aatmanvaidya actually above it, I have my PR. Can you please link it here ?

image

aatmanvaidya commented 2 months ago

I have added the changes to that PR only - can you try doing a hard refresh of the page? strange why its not visible - I have added comments to this PR itself - https://github.com/tattle-made/Uli/pull/597

hardik-pratap-singh commented 2 months ago

I think it should not be the case that you have approved the changes that's why it is not visible

aatmanvaidya commented 2 months ago

could be, don't know why that's happening, added the changes a comment only - https://github.com/tattle-made/Uli/pull/597#issuecomment-2238322742