Closed michardy closed 6 years ago
Normally when parsing the BBC home page we assume some things. Namely:
<a href=""/>
and descriptions will appear in a tag with no children belowNormal:
<!-- outer boundary of headline (never parsed) -->
<div class="gs-c-promo nw-c-promo gs-o-faux-block-link gs-u-pb gs-u-pb+@m nw-p-default gs-c-promo--inline gs-c-promo--stacked@m nw-u-w-auto gs-c-promo--flex" data-entityid="container-top-stories#4">
<!-- Image data (never parsed) -->
<div class="gs-c-promo-image gs-u-display-none gs-u-display-inline-block@xs gel-1/2@xs gel-1/1@m">
<div class="gs-o-media-island">
<div class="gs-o-responsive-image gs-o-responsive-image--16by9">
<img src="https://ichef.bbci.co.uk/news/240/cpsprodpb/C06F/production/_101636294_whatsappimage2018-05-19at11.55.14am-1.jpg" class="lazyloaded" alt="Sabika Sheikh" data-src="https://ichef.bbci.co.uk/news/{width}/cpsprodpb/C06F/production/_101636294_whatsappimage2018-05-19at11.55.14am-1.jpg">
</div>
</div>
</div>
<!-- link, summary, and catagory links (we don't parse this level) -->
<div class="gs-c-promo-body gel-1/2@xs gel-1/1@m gs-u-mt@m">
<!-- Link and summary (we don't look for this but get it with .parent) -->
<div>
<!-- target we look for -->
<a class="gs-c-promo-heading gs-o-faux-block-link__overlay-link gel-pica-bold nw-o-link-split__anchor" href="/news/world-us-canada-44179973">
<h3 class="gs-c-promo-heading__title gel-pica-bold nw-o-link-split__text">Pakistani student among Texas victims</h3>
</a>
<!-- next element that we assume is text if it exits -->
<p class="gs-c-promo-summary gel-long-primer gs-u-mt nw-c-promo-summary">Sabika Sheikh and teacher Cynthia Tisdale are among the first named school shooting victims.</p>
</div>
<!-- ignore -->
<ul class="gs-o-list-inline gs-o-list-inline--divided gel-brevier gs-u-mt-">
<li class="nw-c-promo-meta"><span class="gs-c-timestamp gs-o-bullet gs-o-bullet- nw-c-timestamp">
<span class="gs-o-bullet__icon gel-icon">
<svg viewBox="0 0 32 32" focusable="false"><polygon points="17,15.4 17,6 15,6 15,16.6 23.8,21.7 24.8,19.9"></polygon><path d="M16,4c6.6,0,12,5.4,12,12c0,6.6-5.4,12-12,12S4,22.6,4,16C4,9.4,9.4,4,16,4 M16,0C7.2,0,0,7.2,0,16c0,8.8,7.2,16,16,16 s16-7.2,16-16C32,7.2,24.8,0,16,0L16,0z"></path></svg>
</span>
<time class="gs-o-bullet__text date qa-status-date" datetime="2018-05-19T10:59:21.000Z" data-seconds="1526727561" data-datetime="12h">
<span aria-hidden="true" class="qa-status-date-output">12h</span>
<span class="gs-u-vh">12 hours ago</span>
</time>
<!-- Stray (Don't ask to many questions)-->
</span>
</li>
<li class="nw-c-promo-meta">
<a href="/news/world/us_and_canada" class="gs-c-section-link gs-c-section-link--truncate nw-c-section-link nw-o-link nw-o-link--no-visited-state" aria-label="From US & Canada">
<span aria-hidden="true">US & Canada</span>
</a>
</li>
</ul>
</div>
</div>
Breaks:
<!-- outer boundary of headline (never parsed but gotten with .parent) -->
<div class="gs-c-promo nw-c-promo nw-c-promo--maxim gel-layout gel-layout--no-flex gs-o-faux-block-link gs-u-pb gs-u-pb+@m" data-entityid="container-top-stories#1">
<!-- target we look for -->
<a class="gs-c-promo-heading gs-o-faux-block-link__overlay-link gel-layout__item gel-2/5@xxl gs-u-float-left-@xxl gs-u-mb gs-u-mb+@m gs-u-mb@xxl gel-canon-bold nw-o-link-split__anchor" href="/news/uk-44175216">
<h3 class="gs-c-promo-heading__title gel-canon-bold nw-o-link-split__text">Prince Harry and Meghan married at Windsor</h3>
</a>
<!-- We assume this is a summary thing since it is the second child of the parent -->
<!-- Everything blows up -->
<div class="gs-c-promo-image gel-1/1 gel-3/4@l gel-3/5@xxl gs-u-float-right@l">
<div class="gs-o-media-island">
<div class="gs-o-responsive-image gs-o-responsive-image--16by9 gs-o-responsive-image--lead">
<img src="https://ichef.bbci.co.uk/news/320/cpsprodpb/1184A/production/_101645717_hi046925116.jpg" sizes="(min-width: 900px) 743px, (min-width: 900px) calc(66vw - 64px), calc(vw100 - 32px)" srcset="https://ichef.bbci.co.uk/news/240/cpsprodpb/1184A/production/_101645717_hi046925116.jpg 240w, https://ichef.bbci.co.uk/news/380/cpsprodpb/1184A/production/_101645717_hi046925116.jpg 380w, https://ichef.bbci.co.uk/news/420/cpsprodpb/1184A/production/_101645717_hi046925116.jpg 420w, https://ichef.bbci.co.uk/news/490/cpsprodpb/1184A/production/_101645717_hi046925116.jpg 490w, https://ichef.bbci.co.uk/news/573/cpsprodpb/1184A/production/_101645717_hi046925116.jpg 573w, https://ichef.bbci.co.uk/news/743/cpsprodpb/1184A/production/_101645717_hi046925116.jpg 743w, https://ichef.bbci.co.uk/news/820/cpsprodpb/1184A/production/_101645717_hi046925116.jpg 820w" alt="Prince Harry and Meghan leave for their reception" class="qa-srcset-image">
</div>
</div>
</div>
<!-- Actual parent of summary -->
<div class="gs-c-promo-body gel-1/3@m gel-1/4@l gel-2/5@xxl">
<!-- Actual summary -->
<p class="gs-c-promo-summary gel-long-primer gs-u-mt nw-c-promo-summary gs-u-mt+@m gs-u-mt0@l">Hundreds of guests watched the couple exchange vows in a ceremony featuring a gospel choir and an American preacher.</p>
<ul class="gs-o-list-inline gs-o-list-inline--divided gel-brevier gs-u-mt-">
<li class="nw-c-promo-meta"><span class="gs-c-timestamp gs-o-bullet gs-o-bullet- nw-c-timestamp">
<span class="gs-o-bullet__icon gel-icon">
<svg viewBox="0 0 32 32" focusable="false"><polygon points="17,15.4 17,6 15,6 15,16.6 23.8,21.7 24.8,19.9"></polygon><path d="M16,4c6.6,0,12,5.4,12,12c0,6.6-5.4,12-12,12S4,22.6,4,16C4,9.4,9.4,4,16,4 M16,0C7.2,0,0,7.2,0,16c0,8.8,7.2,16,16,16 s16-7.2,16-16C32,7.2,24.8,0,16,0L16,0z"></path></svg>
</span>
<time class="gs-o-bullet__text date qa-status-date" datetime="2018-05-19T19:39:31.000Z" data-seconds="1526758771" data-datetime="3h">
<span aria-hidden="true" class="qa-status-date-output">3h</span>
<span class="gs-u-vh">3 hours ago</span>
</time>
<!-- Stray (Don't ask to many questions)-->
</span>
</li>
<li class="nw-c-promo-meta">
<a href="/news/uk" class="gs-c-section-link gs-c-section-link--truncate nw-c-section-link nw-o-link nw-o-link--no-visited-state" aria-label="From UK">
<span aria-hidden="true">UK</span>
</a>
</li>
</ul>
</div>
<!-- ~150 loc of drooling over wedding gowns -->
<div class="nw-c-live-event-wrapper gel-layout__item gel-2/3@m gel-1/4@l gs-u-float-right@xxl gel-1/5@xxl">
<div class="nw-c-live-event gs-o-faux-block-link gs-u-mt+ gs-t-news">
<div class="gs-c-promo lx-c-dynamic-promo lx-c-dynamic-promo--secondary gs-o-faux-block-link gs-u-align-left gs-u-ml0 gs-t-news lx-c-dynamic-promo--has-commentary nw-p-default gs-u-mb gs-u-mb+@m gs-u-pt-alt gs-u-pb- gs-u-ph-alt gs-c-promo--flex" data-mode="secondary">
<div class="gs-c-promo-body lx-c-dynamic-promo__body gs-u-p0">
<div class="lx-c-timeline gel-pica-bold">
<div class="gs-u-pb+ lx-c-timeline__item lx-c-timeline__item--first">
<div>
<a class="gel-pica-bold nw-o-link-split__anchor lx-c-dynamic-promo__link gs-u-display-block qa-promo-title" href="/news/live/uk-44167290">
<span class="gs-c-live-pulse gs-o-bullet gs-o-bullet- gs-c-live-pulse--news lx-c-dynamic-promo__pulse gs-u-mr gel-1/1">
<span class="gs-o-bullet__icon gs-c-live-pulse__icon gel-icon">
<svg aria-hidden="true" viewBox="0 0 32 32" focusable="false"><path d="M16 4c6.6 0 12 5.4 12 12s-5.4 12-12 12S4 22.6 4 16 9.4 4 16 4zm0-4C7.2 0 0 7.2 0 16s7.2 16 16 16 16-7.2 16-16S24.8 0 16 0z"></path></svg>
<span class="gs-c-live-pulse__icon-center">
<svg aria-hidden="true" viewBox="0 0 32 32" focusable="false"><circle cx="16" cy="16" r="8.5"></circle></svg>
</span>
</span>
<span class="gs-o-bullet__text qa-live-pulse-text">Live</span>
</span>
<h3 class="gel-pica-bold nw-o-link-split__text lx-c-dynamic-promo__title">Couple cap happy day with private party</h3>
<span class="gs-u-vh">Last updated 19 minutes ago</span>
</a>
</div>
</div>
<h4 class="gs-u-vh qa-timeline-hidden-heading">Most recent posts</h4>
<ol class="lx-c-timeline__list">
<li id="lx-c-timeline__item--0" class="lx-c-timeline__item qa-timeline-item gs-o-media gs-u-pb gs-u-pb-alt@l">
<div class="gs-u-mr- gs-o-media__img lx-c-timeline__keypoint">
<div class="lx-c-timeline__keypoint-icon">
<svg aria-hidden="true" viewBox="0 0 32 32" focusable="false"><circle stroke="none" cx="16" cy="16" r="11"></circle></svg>
</div>
</div>
<div class="gs-o-media__body gel-long-primer lx-c-timeline__body">
<span class="gs-u-vh qa-promo-item-heading">19 minutes ago 'Love recognises no barriers'</span>
<time aria-hidden="true" class="lx-c-timeline__heading-timestamp gs-u-mr">19m</time>
<span aria-hidden="true" class="lx-c-timeline__heading-text qa-item-heading-text">'Love recognises no barriers'</span>
</div>
</li>
<li id="lx-c-timeline__item--1" class="lx-c-timeline__item qa-timeline-item gs-o-media gs-u-pb gs-u-pb-alt@l">
<div class="gs-u-mr- gs-o-media__img lx-c-timeline__keypoint">
<div class="lx-c-timeline__keypoint-icon">
<svg aria-hidden="true" viewBox="0 0 32 32" focusable="false"><circle stroke="none" cx="16" cy="16" r="11"></circle></svg>
</div>
</div>
<div class="gs-o-media__body gel-long-primer lx-c-timeline__body">
<span class="gs-u-vh qa-promo-item-heading">27 minutes ago Royal fireworks</span>
<time aria-hidden="true" class="lx-c-timeline__heading-timestamp gs-u-mr">27m</time>
<span aria-hidden="true" class="lx-c-timeline__heading-text qa-item-heading-text">Royal fireworks</span>
</div>
</li>
</ol>
</div>
</div>
<a href="/news/live/uk-44167290" tabindex="-1" aria-hidden="true" class="qa-overlay gs-o-faux-block-link__overlay lx-c-dynamic-promo__link">Live Couple cap happy day with private party Last updated 19 minutes ago</a>
</div>
</div>
</div>
<div class="nw-c-index-alsos--maximum gel-layout__item gel-1/4@l gs-u-pt gs-u-pt-alt@xs gel-1/1@m gel-1/5@xxl">
<div>
<h4 class="gs-u-vh">Related content</h4>
<ul class="gel-layout gel-layout--no-flex">
<li class="nw-c-related-story nw-c-related-story--1 gel-1/2@s gel-1/1@l gel-1/3@m gs-u-float-left@s gs-u-float-none@l">
<span class="nw-o-bullet+ gel-brevier-bold">
<a href="/news/uk-44181399" class="gel-layout__item nw-o-link-split__anchor gs-u-pt- gs-u-pb- gs-u-display-block">
<span class="nw-o-bullet__icon">
<span class="gs-c-media-indicator gel-brevier-bold gs-c-media-indicator--inline">
<span class="gs-c-media-indicator__icon gel-icon" data-icon="gel-icon-video">
<span class="qa-offscreen gs-u-vh">Video</span>
<svg aria-hidden="true" viewBox="0 0 32 32" focusable="false"><polygon points="3,32 29,16 3,0"></polygon></svg>
</span>
</span>
</span>
<span class="nw-o-bullet__text">
<span class="nw-o-link-split__text gs-u-align-bottom">Harry and Meghan: The kiss</span>
</span>
</a>
</span>
</li>
<li class="nw-c-related-story nw-c-related-story--2 gel-1/2@s gel-1/1@l gel-1/3@m gs-u-float-left@s gs-u-float-none@l">
<span class="nw-o-bullet+ gel-brevier-bold">
<a href="/news/entertainment-arts-44180613" class="gel-layout__item nw-o-link-split__anchor gs-u-pt- gs-u-pb- gs-u-display-block">
<span class="nw-o-bullet__icon">
<span class="nw-c-circle">
<svg aria-hidden="true" viewBox="0 0 32 32" focusable="false"><circle cx="12" cy="21" r="7"></circle></svg>
</span>
</span>
<span class="nw-o-bullet__text">
<span class="nw-o-link-split__text gs-u-align-bottom">In pictures: The celebrity guests</span>
</span>
</a>
</span>
</li>
<li class="nw-c-related-story nw-c-related-story--3 gel-1/2@s gel-1/1@l gel-1/3@m gs-u-float-left@s gs-u-float-none@l">
<span class="nw-o-bullet+ gel-brevier-bold">
<a href="/news/uk-44184151" class="gel-layout__item nw-o-link-split__anchor gs-u-pt- gs-u-pb- gs-u-display-block">
<span class="nw-o-bullet__icon">
<span class="gs-c-media-indicator gel-brevier-bold gs-c-media-indicator--inline">
<span class="gs-c-media-indicator__icon gel-icon" data-icon="gel-icon-video">
<span class="qa-offscreen gs-u-vh">Video</span>
<svg aria-hidden="true" viewBox="0 0 32 32" focusable="false"><polygon points="3,32 29,16 3,0"></polygon></svg>
</span>
</span>
</span>
<span class="nw-o-bullet__text">
<span class="nw-o-link-split__text gs-u-align-bottom">Carriage procession a 'fairytale'</span>
</span>
</a>
</span>
</li>
<li class="nw-c-related-story nw-c-related-story--4 gel-1/2@s gel-1/1@l gel-1/3@m gs-u-float-left@s gs-u-float-none@l">
<span class="nw-o-bullet+ gel-brevier-bold">
<a href="/news/uk-44184331" class="gel-layout__item nw-o-link-split__anchor gs-u-pt- gs-u-pb- gs-u-display-block">
<span class="nw-o-bullet__icon">
<span class="nw-c-circle">
<svg aria-hidden="true" viewBox="0 0 32 32" focusable="false"><circle cx="12" cy="21" r="7"></circle></svg>
</span>
</span>
<span class="nw-o-bullet__text">
<span class="nw-o-link-split__text gs-u-align-bottom">#Blackroyalwedding hailed </span>
</span>
</a>
</span>
</li>
<li class="nw-c-related-story nw-c-related-story--5 gel-1/2@s gel-1/1@l gel-1/3@m gs-u-float-left@s gs-u-float-none@l">
<span class="nw-o-bullet+ gel-brevier-bold">
<a href="/news/uk-44182166" class="gel-layout__item nw-o-link-split__anchor gs-u-pt- gs-u-pb- gs-u-display-block">
<span class="nw-o-bullet__icon">
<span class="nw-c-circle">
<svg aria-hidden="true" viewBox="0 0 32 32" focusable="false"><circle cx="12" cy="21" r="7"></circle></svg>
</span>
</span>
<span class="nw-o-bullet__text">
<span class="nw-o-link-split__text gs-u-align-bottom">Five moments from the wedding </span>
</span>
</a>
</span>
</li>
<li class="nw-c-related-story nw-c-related-story--6 gel-1/2@s gel-1/1@l gel-1/3@m gs-u-float-left@s gs-u-float-none@l">
<span class="nw-o-bullet+ gel-brevier-bold">
<a href="/news/uk-44184034" class="gel-layout__item nw-o-link-split__anchor gs-u-pt- gs-u-pb- gs-u-display-block">
<span class="nw-o-bullet__icon">
<span class="nw-c-circle">
<svg aria-hidden="true" viewBox="0 0 32 32" focusable="false"><circle cx="12" cy="21" r="7"></circle></svg>
</span>
</span>
<span class="nw-o-bullet__text">
<span class="nw-o-link-split__text gs-u-align-bottom">The bridesmaids and pageboys</span>
</span>
</a>
</span>
</li>
</ul>
</div>
</div>
</div>
Do I fix it or assume it will go away?