Closed AlbertoFDR closed 6 months ago
yep, sorry, i shouldn't have pushed code so fast. I just saw a i had already completed what you were asking about in your last issue. I will try to get this sorted and pushed today
@AlbertoFDR this should be sorted now with 144b465904faff1e6de7c44b7417936abde47cf8, though if you are still seeing any issues please reopen. Thanmks!
also @AlbertoFDR would be very interested to learn more about what kinds of queries you're doing against pagegraph data. Im trying to stresstest and check the correctness of some of this stuff, so would be great to try and expand it to more use cases
I'm still finding some issues (I don't think it's because my brave version):
example.org
(Exception: Could not find a creator for this node
): For this issue I've find that it comes from a script
which the call of line 165
of pagegraph/graph/node.py
to self.incoming_edges()
just returns one execute edge
not the one that we are searching (structure
). KeyError: <Types.FRAME_OWNER: 'frame owner'>
): This second one I guess is because using commands related to frame
expect to have at least one frame owner
on the page.About my queries or about what I'm trying to do. As @L3thal14 suggested some months ago (https://github.com/brave/brave-browser/issues/35130), it would be nice to catch all (or almost all) the WebAPI's and builtins calls on Pagegraph. Recording response headers (e.g., XFO, CSP, PP...) from the documents would be also really great. And my final suggestion is to add the attributes
of the HTML elements, not just the ones that are dynamically created, and be able to recover the changes. I know that some of the suggestions could be hard/impossible to maintain, such as, the experimental attr.
More technical questions about Pagegraph. What is the logic behind the frame owner
node? So, I've seen (if I remember correctly) that in cases that there is no src
or that the src is about:blank
or javascript:
there isn't a frame owner
node. In the other case, for iframes that includes a specific path of the same domain, the frame owner
is created. My guess is that the first ones runs in the same context but for the second case, even if they have access, they are running in different context. Also, I noticed that for each frame owner
there is a first DOM root
empty (about:blank
), I guess this is for isolation, but for Pagegraph case we don't need it. Thanks again :)
In addition for my previous comment, for the JS calling part, it would be also nice to have some traceability of the calling somehow. I've think, that something similar to what Devtools does, like even if the script is minified you can see the script prettified and where the call comes from (Initiator
tab). This idea, could be also very challenging for the implementation I guess.
I don't think it's because my brave version
Please make sure you are using the most recent nightly version, or building your own versions. There have been several signifigant changes in the last few weeks
Using a dummy local page with one iframe…
I believe this is fixed with the most recent pagegraph-query push
it would be nice to catch all (or almost all) the WebAPI's and builtins calls on Pagegraph
A large number are already caught. You can add as many more as you like by specifying the ones you want to catch in this file as described on the wiki
that in cases that there is no src or that the src is about:blank or javascript: there isn't a frame owner node
This is not correct. The frame owner is the iframe (or similar) that can contain any number of different documents (as represented by DOMRootNode
instances) by changing the src or of the iframe. But there will always be a FrameOwnerNode
for every iframe (and similar)
nice to have some traceability of the calling
I am not sure I understand here. PageGraph makes it clear which script is responsible for the JS call, and even provides the exact line and character offset in the JS file responsibile. What additional traceability are you looking for?
if the script is minified you can see the script prettified
I see how this could be handy, but its pretty far outside what PageGraph is looking to do. Though, you shouldn't have any problem extracting the JS souce from PageGraph, and running it through any formatter / beautifier you like, and / or mapping the call site PageGraph identifies in the original source to the formatted source.
Yes, you are right! My bad. Thanks!
Sorry for bothering you again, I don't know if you are still working on this. It's just I have a similar project to this for parsing the graphs with python (not that clean obviously) :)
The code in Line 63 of
pagegraph/graph/__init__.py
crashes with the errorTypeError: 'module' object is not iterable
in the linefor insert_edge in self.insert_edges():
. I suppose that the correct call isedges()
notinsert_edges()
and I don't know which kind of functionality is missing there.Thanks again!