jina-ai / reader

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
https://jina.ai/reader
Apache License 2.0
6.44k stars 499 forks source link

Headers (wrongly) removed #15

Open 6DiegoDiego9 opened 5 months ago

6DiegoDiego9 commented 5 months ago

In this page https://creator.poe.com/docs/quick-start all the (bold and big) headers are wrongly removed by Jina.

Example (html on the left, jina markdown (rendered) on the right) image

At the html level they look like this: image

nomagick commented 5 months ago

We are using the @mozilla/readability for content transformation instead of anything visual-related.

It seems that the text in the div block was ignored because it lacks an explicit semantic meaning despite its visual significance. I believe it will be captured as headings if put in h1-h6 tags.

hanxiao commented 5 months ago

from the screenshot it looks like it is in h2.

nomagick commented 5 months ago

Oh indeed. Sorry. However it is true that readability ignored the headings. I can share the direct output of readability here:

<div id="readability-page-1" class="page"><div id="content-container"><section><div data-testid="RDMD"><p>In this quick start guide, we will build a bot server in Python and then integrate it with Poe. Once you have created a Poe bot powered by your server, any Poe user can interact with it. The following diagram might be useful in visualizing how your bot server fits into Poe.</p>
<p><span aria-label="" role="button" tabindex="0"><span><img alt="" loading="lazy" src="https://files.readme.io/9bff0b8-image.png" caption="" height="auto" title="" width="auto"></span></span></p>
<p>For more information on Poe server bots, check out the <a target="_self" href="https://creator.poe.com/docs/poe-protocol-specification">Poe Protocol Specification</a>.</p>

<p>We recommend using <a target="_self" href="https://modal.com/?utm_source=poe">Modal</a> to deploy your bot, but you can also use any cloud provider of your choice; all you need to do is to make the bot server available at a publicly available URL and once you have that, you can skip to <a target="_self" href="https://creator.poe.com/docs/quick-start#integrating-with-poe">integrating it with Poe</a>. In order to use Modal to deploy your bot, do the following.</p>

<p>Make sure you have Python installed. Open a terminal and run <code data-lang="" name="" tabindex="0"><span>pip install modal-client</span></code>. You might have to use pip3 instead of pip depending on your version of Python.</p>

<p>This step involves setting up access to modal from your terminal. You only need to do this once for your computer. In the terminal, run <code data-lang="" name="" tabindex="0"><span>modal token new --source poe</span></code>. If you run into a "command not found" error, try <a target="_self" href="https://modal.com/docs/guide/troubleshooting#command-not-found-errors">this</a>.</p>
<p>If that command runs successfully, you will taken to your web browser where you will be asked to log into modal using your Github account.</p>
<p><span aria-label="" role="button" tabindex="0"><span><img alt="" loading="lazy" src="https://files.readme.io/0fcb528-image.png" caption="" height="auto" title="" width="auto"></span></span></p>
<p>After you login, click on "create token". You will be prompted to close the browser window after that.</p>
<p><span aria-label="" role="button" tabindex="0"><span><img alt="" loading="lazy" src="https://files.readme.io/83f119a-image.png" caption="" height="auto" title="" width="auto"></span></span></p>

<p>In your terminal, run:</p>
<ul>
<li><code data-lang="" name="" tabindex="0"><span>git clone https://github.com/poe-platform/server-bot-quick-start</span></code></li>
<li><code data-lang="" name="" tabindex="0"><span>cd server-bot-quick-start</span></code></li>
<li><code data-lang="" name="" tabindex="0"><span>pip install -r requirements.txt</span></code></li>
<li><code data-lang="" name="" tabindex="0"><span>modal deploy echobot.py</span></code></li>
</ul>
<p>Modal will now deploy your app and output two urls: a) the endpoint at which your app is hosted b) an internal page where you can monitor your app. You will be using the former to integrate your bot into Poe.</p>

<p>Once you have a bot running under a publicly accessible URL, it is time to connect it to Poe. You can do that on your desktop by going to the bot creation <a target="_self" href="https://poe.com/create_bot?server=1">form</a>. You can customize how your bot looks by providing a picture, name and description. After you fill out the server URL and click "create bot", your bot should be ready for use on all Poe clients.</p>

<ul>
<li>For faster iteration on your bot, we recommend using Modal's serve command (as in <code data-lang="" name="" tabindex="0"><span>modal serve echobot.py</span></code>). On running that command, Modal will deploy an ephemeral version of your app which live updates in response to any code change. In addition, any print/debug statements will output to your terminal.</li>
<li>The README provides a brief description of the other example bots included in the repo. Feel free to iterate upon and/or deploy them.</li>
</ul>

<ul>
<li>One of the advantages of building a bot on Poe is the ability to invoke other Poe bots. In order to learn how to do that check out: <a target="_self" href="https://creator.poe.com/docs/accessing-other-bots-on-poe">accessing-other-bots-on-poe.md</a>.</li>
<li>Check out other detailed guides that show you how to enable specific features:
<ul>
<li><a target="_self" href="https://creator.poe.com/docs/rendering-an-image-in-the-response">rendering-an-image-in-the-response.md</a></li>
<li><a target="_self" href="https://creator.poe.com/docs/enabling-file-upload-for-your-bot">enabling-file-upload-for-your-bot.md</a></li>
<li><a target="_self" href="https://creator.poe.com/docs/setting-an-introduction-message">setting-an-introduction-message.md</a></li>
</ul>
</li>
<li>Refer to the <a target="_self" href="https://creator.poe.com/docs/poe-protocol-specification">specification</a> to understand the full capabilities offered by Poe server bots.</li>
<li>Check out the <a target="_self" href="https://pypi.org/project/fastapi-poe/">fastapi-poe</a> library, which you can use as a base for creating Poe bots.</li>
</ul></div><p><i></i>Updated<!-- --> <!-- -->22 days ago<!-- --> </p><hr></section><section><nav><ul><li><a href="#" target="_self"><i></i>Table of Contents</a></li><li><ul>
<li><a href="#deploying-your-bot" target="_self">Deploying your bot</a></li>
<li><a href="#integrating-with-poe" target="_self">Integrating with Poe</a></li>
<li><a href="#iterating-on-your-bot" target="_self">Iterating on your bot</a></li>
<li><a href="#where-to-go-from-here" target="_self">Where to go from here</a></li>
</ul></li></ul></nav></section></div></div>
6DiegoDiego9 commented 5 months ago

the Reading mode of Edge have it correctly:

image

ShravanSunder commented 2 months ago

i'm encountering this as well, where headings are removed. Perhaps when they are not in article tag.