igumnoff / shiva

Shiva library: Implementation in Rust of a parser and generator for documents of any type
https://docs.rs/shiva
GNU General Public License v3.0
159 stars 10 forks source link

HTML to markdown #104

Open cgisky1980 opened 1 month ago

cgisky1980 commented 1 month ago

I have a html file

<html>
  <head>
    <title>Chew dad's slippers</title>
  </head>
  <body>
    <h1>
      Instead of drinking water from the cat bowl, make sure to steal water from
      the toilet
    </h1>
    <h2>Chase the red dot</h2>
    <p>
      Munch, munch, chomp, chomp hate dogs. Spill litter box, scratch at owner,
      destroy all furniture, especially couch get scared by sudden appearance of
      cucumber cat is love, cat is life fat baby cat best buddy little guy for
      catch eat throw up catch eat throw up bad birds jump on fridge. Purr like
      a car engine oh yes, there is my human woman she does best pats ever that
      all i like about her hiss meow .
    </p>
    <p>
      Dead stare with ears cocked when owners are asleep, cry for no apparent
      reason meow all night. Plop down in the middle where everybody walks favor
      packaging over toy. Sit on the laptop kitty pounce, trip, faceplant.
    </p>
  </body>
</html>

parse document is good (but don't undstand )</p> <pre><code>{ "elements": [ { "Text": { "text": "Chew dad's slippers", "size": 8 } }, { "Header": { "level": 1, "text": "Instead of drinking water from the cat bowl, make sure to steal water from\n the toilet" } }, { "Header": { "level": 2, "text": "Chase the red dot" } }, { "Paragraph": { "elements": [ { "Text": { "text": "\n Munch, munch, chomp, chomp hate dogs. Spill litter box, scratch at owner,\n destroy all furniture, especially couch get scared by sudden appearance of\n cucumber cat is love, cat is life fat baby cat best buddy little guy for\n catch eat throw up catch eat throw up bad birds jump on fridge. Purr like\n a car engine oh yes, there is my human woman she does best pats ever that\n all i like about her hiss meow .\n ", "size": 8 } } ] } }, { "Paragraph": { "elements": [ { "Text": { "text": "\n Dead stare with ears cocked when owners are asleep, cry for no apparent\n reason meow all night. Plop down in the middle where everybody walks favor\n packaging over toy. Sit on the laptop kitty pounce, trip, faceplant.\n ", "size": 8 } } ] } } ], "page_width": 210.0, "page_height": 297.0, "left_page_indent": 10.0, "right_page_indent": 10.0, "top_page_indent": 10.0, "bottom_page_indent": 10.0, "page_header": [], "page_footer": [] }</code></pre> <p>generate as text is ok</p> <pre><code class="language-text">Chew dad's slippers Instead of drinking water from the cat bowl, make sure to steal water from the toilet Chase the red dot Munch, munch, chomp, chomp hate dogs. Spill litter box, scratch at owner, destroy all furniture, especially couch get scared by sudden appearance of cucumber cat is love, cat is life fat baby cat best buddy little guy for catch eat throw up catch eat throw up bad birds jump on fridge. Purr like a car engine oh yes, there is my human woman she does best pats ever that all i like about her hiss meow . Dead stare with ears cocked when owners are asleep, cry for no apparent reason meow all night. Plop down in the middle where everybody walks favor packaging over toy. Sit on the laptop kitty pounce, trip, faceplant.</code></pre> <p>BUT as md , miss a lot</p> <pre><code class="language-md">Chew dad's slippers # Instead of drinking water from the cat bowl, make sure to steal water from the toilet ## Chase the red dot </code></pre> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/evgenyigumnov"><img src="https://avatars.githubusercontent.com/u/10693931?v=4" />evgenyigumnov</a> commented <strong> 1 month ago</strong> </div> <div class="markdown-body"> <p>Hello,</p> <p>depends from <a href="https://github.com/igumnoff/shiva/issues/105">https://github.com/igumnoff/shiva/issues/105</a></p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>