Closed ehubb20 closed 1 month ago
@ehubb20 Let me check snd update your.
Hey @unclecode any update on this? I too am trying to figure out how to parse iframe content
Any update on this?
Hello Everyone @ehubb20 @b-sai @shhivam sorry for the late reply. We've been very busy bringing a lot of new features, and one of them is actually extracting the kind of information from the iframe. It's still early days, so it's going to be with the new version 0.3.6, which we're going to release by tomorrow. I definitely expect some bugs, so please use it and report any issues you come across, and we can fix them right away.
It currently extracts the content of the "body" of the iframe, replaces it with a div
element in the main page, making it part of the main page. You can think of it as a way of flattening, but what we extract is the body content of the iframe. We plan to add more options and parameters for extracting these elements.
Btw without that when you crawl a page, you get all internal/external links and then scrape those links for iframes again. This already provides a lot of options.
Anyway I've shared a sample of the code with you here. Hopefully, when we update the library, you'll be able to use it. I appreciate it if you could let us know about any bugs or issues you encounter.
async def main():
async with AsyncWebCrawler(verbose=True, headless = False) as crawler:
url = "https://zcgwq2-5000.csb.app"
result = await crawler.arun(
url=url,
bypass_cache=True,
process_iframes=True
)
I keep the issue open, in case you face with any error.
Thanks @unclecode for the clarification!
@shhivam The iframe extraction is already available, please check:
async def test_oframe():
async with AsyncWebCrawler(verbose=True, headless = False) as crawler:
url = "URL-HERE"
result = await crawler.arun(
url=url,
bypass_cache=True,
process_iframes=True
)
What is the best method for extracting data from an iframe using Crawl4ai?
Here is an example of the iframe I am trying to capture:
<div class="list-items new_properties_scroll"><ul><li><div class="list-item-des"><a class="list_image_click" href="https://homes.rently.com/homes-for-rent/properties/4203494?fromsearch=true&companyID=13160&source=iframe" target="_blank"></a><div class="container-fluid" style="max-width:1399px;"><a class="list_image_click" href="https://homes.rently.com/homes-for-rent/properties/4203494?fromsearch=true&companyID=13160&source=iframe" target="_blank"></a><div class="row item"><a class="list_image_click" href="https://homes.rently.com/homes-for-rent/properties/4203494?fromsearch=true&companyID=13160&source=iframe" target="_blank"><div class="col-md-2 col-sm-2"><div style="background-image: url(https://s3.amazonaws.com/Rently_dev/images/51453851/medium);"></div></div><div class="col-md-4 col-sm-4 col-xs-4 basic-info"><div class="price priceWithTooltip"><h2><span class="amount">$1757</span><span class="unit"> / month</span></h2></div><div class="available-date"><h2>Available: Now</h2></div><span class="mini-address">231 Crestview Way, Dallas, GA, 30132, Un...</span><div class="info"><div class="col-md-6 col-sm-3 col-xs-3"><img class="center" src="/assets/bed.svg"><span><strong style="font-size: 1.5em;">3</strong> Bed(s)</span></div><div class="col-md-6 col-sm-3 col-xs-3"><img class="center" src="/assets/shower-head.svg"><span><strong style="font-size: 1.5em;">2.5</strong> Bath(s)</span></div><div class="col-md-6 col-sm-3 col-xs-3"><img class="center" src="/assets/cat_dog.svg"><span style="line-height: 2.2;"> Cat + Dog</span></div><div class="col-md-6 col-sm-3 col-xs-3" style="line-height: 30px;"><img class="center" src="/assets/sq_ft.svg"><span>1530 Sq ft</span></div>