simonw / strip-tags

CLI tool for stripping tags from HTML
Apache License 2.0
209 stars 6 forks source link

Bundles for tables and lists #18

Closed simonw closed 1 year ago

simonw commented 1 year ago

Refs:

Still needs:

simonw commented 1 year ago

This is weird:

curl -s 'https://simonwillison.net/' | strip-tags -t lists 'ul,li,ol'

Ends with:

    </ul>
<li>Source code</li>
<li>©</li>
<li>2002</li>
<li>2003</li>
<li>2004</li>
<li>2005</li>
<li>2006</li>

There's a missing <ul> there.

benjamin-kirkbride commented 1 year ago

Confirmed that it is not an error with the HTML of the page:

<div id="ft">
    <ul>
      <li><a href="https://github.com/simonw/simonwillisonblog">Source code</a></li>
      <li>&copy;</li>
      <li><a href="/2002/">2002</a></li>
      <li><a href="/2003/">2003</a></li>
      <li><a href="/2004/">2004</a></li>
      <li><a href="/2005/">2005</a></li>
      <li><a href="/2006/">2006</a></li>
      <li><a href="/2007/">2007</a></li>
      <li><a href="/2008/">2008</a></li>
      <li><a href="/2009/">2009</a></li>
      <li><a href="/2010/">2010</a></li>
      <li><a href="/2011/">2011</a></li>
      <li><a href="/2012/">2012</a></li>
      <li><a href="/2013/">2013</a></li>
      <li><a href="/2014/">2014</a></li>
      <li><a href="/2015/">2015</a></li>
      <li><a href="/2016/">2016</a></li>
      <li><a href="/2017/">2017</a></li>
      <li><a href="/2018/">2018</a></li>
      <li><a href="/2019/">2019</a></li>
      <li><a href="/2020/">2020</a></li>
      <li><a href="/2021/">2021</a></li>
      <li><a href="/2022/">2022</a></li>
      <li><a href="/2023/">2023</a></li>
    </ul>
</div>
benjamin-kirkbride commented 1 year ago

Seems like this is the same issue as https://github.com/simonw/strip-tags/issues/21

simonw commented 1 year ago

I'm going to use cog for the README.

simonw commented 1 year ago

Bundles docs here: https://github.com/simonw/strip-tags/blob/42e03c2764fbb74ef26d0f834f817c437cc1f524/README.md#keeping-the-markup-for-specified-tags