longvh211 / Chromium-Automation-with-CDP-for-VBA

A method to directly automate Chromium-based web browsers, such as Chrome, Edge, and Firefox, using VBA for Office applications by following the Chrome DevTools Protocol framework.
MIT License
45 stars 6 forks source link

Issue when accessing iframe content without SOP #16

Closed Slump92 closed 1 year ago

Slump92 commented 1 year ago

Hi,

I'm developping a tool for my department that aims to scrape/parse an intranet webpage for some informations, and create a mail based on it (I'm putting aside context infos, but real quick we have a kinda messy intranet workflow which needs us to double our work with a mail, so I'm trying to automate this but need to work with an over-secured environment and IT doesn't have time for us).

I can access the webpage were everything is displayed fine with CDP and Chrome. Problem is, the informations I need to get are displayed in an iframe, and I can't access its content due to SOP violation. Here is the roughly recreated DOM tree, can't copy what I actually have :

...
<body>
    <iframe id="iframeApp" name="iframeApp" frameborder="0" width="100%" height="100%">
    #document
        <html>
            <head>...<head>
            <body>
                <div id="mainContainer" class="container">
                    <input name="someID" type="text" id="someID" placeholder="" class="someClass" maxlength="10" value="0123456789" readonly="readonly">
                </div>
            </body>
        </html
    </iframe>
</body>
...

Of course, when I try to do :

document.getElementById("iframeApp").contentDocument.getElementById("someID").value

It returns Null on contentDocument, and when I try :

document.getElementById("iframeApp").contentWindow.document

It returns an Uncaught DOMException: Blocked a frame with origin "XXXXX" from accessing a cross-origin frame.

So VBA wise, if I try to do :

Set iFrameVar = objBrowser.getElementByID("iframeApp").getIFrame
result = iFrameVar.getElementByID("someID").value

It returns Null as it can't access it either.

After hours of research, I have a feeling I'm trying to do something overly complex for what I want to achieve. I see talks about security issues everywhere, but really I'm just trying to read DOM after it has loaded (I'm even okay with just waiting X seconds to do it, not listening to any event), which I fail to understand why it would be a security issue considering everything I need is already visible and accessible manually from Chrome Dev Tools (Be it from Elements or Console by changing context). It's really just a read-only job.

I noticed in the getIFrame function comment that "only indirect automation to the iFrame document may work [when it's not on the same origin]". What is this referring to exactly?

I also found here that on Selenium you can switch context with driver.switch_to.frame(), is there a way to do that with CDP and to implement it in the framework (I found in the CDP doc that you seem to need to pass a contextID with runtime.evaluate, but I don't really know how to find it, I do have the name of the iframe context I need though)?

Any help would be greatly appreciated on this. I'm open to any solution, even if it's an entirely different approach. Thanks a lot for your time.

longvh211 commented 1 year ago

Hi Dr. Slump!

I think Selenium switchTo method is also limited to same-domain interaction as well. Maybe if you can confirm this in the future, do let me know.

In regards to your inquiry, you can interact with the cross-domain iframe "indirectly" in CDP by (1) opening a new tab with the iframe url using .newTab, and then (2) perform data input on that tab with CDP and submit it there.

Slump92 commented 1 year ago

Hi Long,

Thank you for your input into this! Unfortunately, the iframe doesn't have a SRC I can retrieve (.src returns an empty string), and I don't really know where I could pull it. There is a lot of AJAX going on dynamically everywhere in the page to load the data, so I don't think I can just open the iframe outside of this page.

About Selenium, if I'm to refer to the doc here, it does say at the end that :

WebDriver is not bound by the same origin policy, so it is always possible to switch into child browsing contexts, even if they are different origin to the current browsing context.

But people said here that CDP doesn't seem to have an equivalent for that. I tried the suggested method proposed by kensoh while adapting it to my case :

document.evaluate('//* @id="iframeApplication"]',document,null,XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,null).snapshotItem(0).contentDocument

But it still returns Null. The answer by PaulIrish may be what I'm looking for, but I don't know how to implement those events listening through the VBA framework.

longvh211 commented 1 year ago

Hi Dr. Slump,

My apologies I failed to notice from your code excerpt that the src attribute is not there. This is a very interesting scenario where by right if the src attribute is blank, the iframe should be in the same domain.

From your lead on the CDP response by Paulirish, I think we can possibly use the Page domain to invoke interaction on a target frame. Let me do some investigation on this route and see if that is so.

Meanwhile, would you be able to double check if within the parent html document, the url/src of the iframe is shown somewhere? If you can find it, that should be the easiest method to interact with it for now.

Slump92 commented 1 year ago

Hi Long,

No problem, any answer at all is really appreciated.

There is a hell lot of url's and src's in the parent HTML, most seem dynamically constructed, and I'm not fluent enough in javascript to actually understand what it's doing and how it's doing it exactly I'm afraid. Besides, as it's a secure environment, I'm pretty sure opening the iframe directly in a new tab, if possible at all, would result in missing credentials being passed through and thus ending in a 403 or something like that.

Tried a bunch of them but most either show nothing or get me back to the parent page, with or without some weird graphic glitches (Like miscoded characters) (But all of them still nest the iframe, thus not helping with accessing its content).

EDIT : Also, the link from Puppeteer's frame handling source quoted by Paulirish was broken, I think it's now this one in TypeScript.

longvh211 commented 1 year ago

Hi Dr. Slump,

By any chance, is the above target iframe the only iframe in the document or is it nested in another iframe in that document?

Slump92 commented 1 year ago

Hi Long,

It is the only one nested in the main document. There is two other iframes nested into that iframe (For handling some content like printing related features), but I don't need access to those ones.

longvh211 commented 1 year ago

How about right-click on the iframe element in the Document Inspection, then choose Show iframe details. Does it show you any URL that you can use to test isolated interaction with it on a new tab with CDP?

Slump92 commented 1 year ago

The URL showing up is the one of the main document, so when I open it up in a new tab, I just end up on the same page, with the iframe still inside it that I can't access.

longvh211 commented 1 year ago

Would you be able to execute this line on the page with iframe and retrieve for me the debug info of the execution?

objBrowser.invokeMethod("Page.getFrameTree", dbgMsg:=True)

Slump92 commented 1 year ago

Execution error 450 : Wrong number of arguments or invalid property assignment

But I do get in debug :

18:25:38 | BRID706 | Invoked "Page.getFrameTree" with result: {"frameTree":{"frame":{"id":"F46019E4DC9BE3C3ADB74F2624F413AA","loaderId":"2DEC08077B005BCBA6C97E0A5A981DD9","url":"https://xxxxxxx.dom101.xxxxxx/yyyyyyyyyyy/zzzzzzz/pages/default.aspx?someargs","domainAndRegistry":"dom101.xxxxxx","securityOrigin":"https://xxxxxxx.dom101.xxxxxx","mimeType":"text/html","adFrameStatus":{"adFrameType":"none"},"secureContextType":"Secure","crossOriginIsolatedContextType":"NotIsolated","gatedAPIFeatures":[]}}}

URL and Domains are partly redacted for security reasons. The URL is the one of the main page, which I can open (There is just a bunch of arguments at the end to define which data to load into the form).

EDIT : Going home, won't be able to do further testing until Monday, but don't hesitate to stockpile any tests you have in mind, I will run them Monday morning GMT+1.

longvh211 commented 1 year ago

Hi Dr. Slump,

The debug info essentially says there is only one frame attached to the window and that frame is of course the main document frame. The iframe element is not returned at all as a child frame because of cross-domain. In that case, I am afraid we will not be able to interact with it using the event handling method mentioned by Paulirish.

Normally in this scenario, I would try to locate the source of the cross-domain iframe to open it directly in a new tab and work from there.

The only alternative I can think of is to use UIA framework to interact with the form fields as UI Elements. If you are unfamiliar with this technique, there is a good introduction here. I often use Accessibility Insights to check how an HTML form field is normally queried by UIA. The advantage of using UIA is that it treats the web page just like a user interface of a standard window and thus can bypass all the internal restrictions that are associated with the page.

Slump92 commented 1 year ago

Hi Long,

Sorry I didn't have time to come back sooner. I thank you for your insight, I will try this UIA approach then, from what I found it seems to work from VBA so it may be the thing I need!

Thanks again for your work and time!