Closed aatmanvaidya closed 2 months ago
assigning this issue to @hardik-pratap-singh
I want to work on this issue. Please assign this to me !
Fix for 2nd checkpoint : PR #596
Related code can be found in directory - browser-extension/c4gt_testing_code/week3
Can be tested by running index.html
present inside week3
directory
icon
, do we remove its <span>
tag from the DOM or do we hide its visibility. Need of doing it ?
They are giving some unwanted results like injecting icons across same slur word multiple times and also if we remove these text nodes we will not have to check for slurWords inside them later. If we have minified version of HTML then there are no false textNodes in it, but when HTML is not minified, browser adds an empty textNode before and after each elements, this increases the count of textNodes drastically and in our data structure all such nodes get unnecessarily appended. Since we are checking for slurWords for each of these textNodes so it is impossible that a slurWord exist in such an empty text Node.
Approach :
Empty text nodes will look something like this : \n
, \n\n
, etc. with any number of whitespace characters attached to them
So what one thing that we can do is we can count the number of newline and whitespace characters in the text and store them in some cnt
variable
if (textNode.length === cnt){
//we can avoid pushing such textNodes in our data structure
}
else{
//if it has some characters then we can add them to our data structure
}
Implemented this function following the approach described above
PS : Results of Adding this function
Earlier, for the same HTML page, we were getting total 31 responses as textNodes (including false textNodes)
But after removing false textNodes, we are getting 8 responses as textNodes, (now we need to check for slurWords in these 8 textNodes instead of checking in those 31 textNodes including empty textNodes as well)
indexTree
that given a Node
or Element
, traverses it and returns a uliStore
object with the following sample structure
uliStore = {
"random_id_1" : {
element: Node || Element
},
"random_id_2" : {
element: Node || Element
}
}
locateSlur
that goes through each item in uliStore
and detects the presence of slurs in it and adds the following fields to uliStore
uliStore = {
"random_id_1" : {
element: Node || Element,
slurText : "crazy",
slurLocation: [4,10]
},
}
indexTree
and locateSlur
function on the following websites and share the resulting uliStore
with us.
From our call, we know that @hardik-pratap-singh has some version of indexTree
, so the revised deliverables for this week are :
getAllTextNodes
to return a simplified datastructure uliStore
. Right now it stores html node elements and thats very large.locateSlur
function that can return the slur location indices as opposed to directly finding slurs and injecting svgs.One of the reasons to decouple finding slur location with actual svg injection is that I'd like to experiment with different type of actions on the slur. So svg icon is one way to indicate that a slur has been found and you can hover on it. But we can also try something more subtle, like this The slur word would just be mildly highlighted and not take away from your reading experience.
uliStore
After passing through indexTree()
/getAllTextNodes()
function, uliStore will look like this :
uliStore = [
{
node : text , //this is the text node
parent : parentNode //this is the parent of the text node
},
{
node : text,
parent : parentNode
},
{
},
.
.
.
]
Now after this, when uliStore is processed within locateSlur()
function, uliStore will look like something like this :
uliStore = [
{
node: node,
parentNode: parentNode,
slurs: [
{
slurText : 'crazy',
slurLocation : [[4 , 8] , [12 , 16]]
},
{
slurText : 'stupid',
slurLocation : [[19 , 24] , [34 , 36] , [67 , 76]]
},
{
slurText : 'mad',
slurLocation : [[1 , 3] , [21 , 25]]
},
{
},
{
}
]
},
{
node: node,
parentNode: parentNode,
slurs: [
{
slurText : 'crazy',
slurLocation : [[3 , 7] , [11 , 16]]
},
{
slurText : 'mad',
slurLocation : [[12 , 23] , [71 , 75]]
},
{
},
{
}
]
},
{
}
]
We have to append slurs
to each of the uliStore
object which is going to store the information about all the slurs present in the text node stored in that particular uliStore
.
@dennyabrain @aatmanvaidya Please review it and let me know if I am going in right direction ?
@hardik-pratap-singh looks good to me. You can proceed.
uliStore
getAllTextNodes()
and locateSlur()
@dennyabrain @aatmanvaidya Please review !!
@aatmanvaidya this looks good to me. Should we start moving the getAllTextNodes
and locateSlur
functions into the source code? While the implementation can be finetuned later, these two functions are very foundational and can be reused and repurposed. So figure out where to put them and how to organize it, create a new PR with just those two functions added and merge it.
@dennyabrain sounds good to me, only one thing, since we have a release coming up, shall we merge this after the release. I can finish the work on release by early next week (Mon/Tue). Will try to get it done by monday
We don't have to think of this as a blocker since we will only be adding the functions, we won't be adding any code that uses these functions. So it should not be a problem. I'll let you take a final call though.
got it, let's merge the function's then,
I was thinking for now, we add the functions to the plugin/src/transform-general.js
files only - https://github.com/tattle-made/Uli/blob/main/browser-extension/plugin/src/transform-general.js
going ahead, if things get too clogged up here, or we feel that this could be a new module, then we can move things to a new file, does that work?
Yeah sounds good to me.
Hello @dennyabrain @aatmanvaidya,
As per the discussions, I am adding the functions getAllTextNodes()
and locateSlur()
inside plugin/src/transform-general.js
.
However, I have commented the function calls and log statements for now.
Hi @hardik-pratap-singh have reviewed and added few comments on the PR
Hello @aatmanvaidya, I can't find your comments on the PR
hi @hardik-pratap-singh , just above this you will find reviews added by me, something like this
by comment I meant, reviews added to the code
@aatmanvaidya actually above it, I have my PR. Can you please link it here ?
I have added the changes to that PR only - can you try doing a hard refresh of the page? strange why its not visible - I have added comments to this PR itself - https://github.com/tattle-made/Uli/pull/597
I think it should not be the case that you have approved the changes that's why it is not visible
could be, don't know why that's happening, added the changes a comment only - https://github.com/tattle-made/Uli/pull/597#issuecomment-2238322742
Choose any webpage and save it's html in a file. Try to pick a webpage that has a good mix of different html elements.
Let's try to work on DOM parsing and creating data structures that are suited for our use case. Remember, we want to optimise for following functions
Review the source code of a commonly used dom diffing library -
morphdom
https://github.com/patrick-steele-idem/morphdom Also the TreeWalker (https://developer.mozilla.org/en-US/docs/Web/API/TreeWalker), which is natively available in browser as well. These two will help you understand common approaches to dom parsing and refresh any data structure or algorithm knowledge you will need for this task. As your deliverable, try writing an implementation, however naive and inefficient. We will use that as a starting point for critiquing and refining