vsavo / htmltabletomd

Convert html tables to markdown
MIT License
7 stars 4 forks source link

How can I show Sub heading in the table? #5

Open sree1658 opened 2 months ago

sree1658 commented 2 months ago

When I am using htmltabletomd.convert_table(html_table) - I could see that Only headers and data are converted to table - sub headers are not considerd in the data and they are printed as null

Mian-Ahmed-Raza commented 2 months ago

To include subheadings in your Markdown table when using the htmltabletomd library, you need to ensure that the subheadings are correctly formatted and interpreted within the HTML table structure. The library generally converts standard table headers and data, but subheadings may not be recognized unless they are represented correctly in the HTML table.

Hereโ€™s an example of how to structure your HTML table to include subheadings:

<table>
  <thead>
    <tr>
      <th rowspan="2">Main Header 1</th>
      <th colspan="2">Sub Header Group</th>
    </tr>
    <tr>
      <th>Sub Header 1</th>
      <th>Sub Header 2</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Data 1</td>
      <td>Data 2</td>
      <td>Data 3</td>
    </tr>
    <tr>
      <td>Data 4</td>
      <td>Data 5</td>
      <td>Data 6</td>
    </tr>
  </tbody>
</table>
sree1658 commented 2 months ago

@Mian-Ahmed-Raza Thanks for your help - I Have a similar structure in the website but the sub-headers are captured as null. Attaching the structure below. image

Mian-Ahmed-Raza commented 2 months ago

@sree1658 The issue might be due to how the tags are organized in the table, or how colspan and rowspan are being used. Let me help you with a structured version that should resolve the sub-header issue.

Fixed Table Structure:

<table border="0" cellpadding="2" cellspacing="0" id="lsd_table" align="center" class="al_center">
    <caption>Progress of Sowing of Kharif Crops (As of 23 Sep 2024)</caption>
    <thead>
        <tr>
            <th class="hbg hal hft">&nbsp;</th>
            <th class="hbg hac hft" colspan="3">Actual area ('000 hectare)</th>
            <th class="hbg hac hft" colspan="2">% change in actual area sown</th>
        </tr>
        <tr>
            <th class="hbg hal hft">&nbsp;</th>
            <th class="hbg har hft">2022</th>
            <th class="hbg har hft">2023</th>
            <th class="hbg har hft">2024</th>
            <th class="hbg har hft">2023</th>
            <th class="hbg har hft">2024</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td class="bg al ft">Agricultural products</td>
            <td class="bg ar ft">109,923</td>
            <td class="bg ar ft">108,826</td>
            <td class="bg ar ft">110,465</td>
            <td class="bg ar ft">-1.0</td>
            <td class="bg ar ft">1.5</td>
        </tr>
    </tbody>
</table>

Now, the sub-headers should appear correctly without being captured as null. If you still face issues or need further assistance with this table or any other aspect, let me know!

sree1658 commented 2 months ago

@Mian-Ahmed-Raza - I am trying to scrap a website so wont be able to change the html format - Thanks for the help anyway - Lesson learnt. ๐Ÿ˜Š๐Ÿ™‚

Mian-Ahmed-Raza commented 2 months ago

@sree1658 no problem dear๐Ÿ˜Š๐Ÿ˜Š