Open s-leroux opened 6 months ago
Here is a sample HTML fragment from Yahoo! Finance:
<!DOCTYPE html>
<html>
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.2.0">
<title></title>
</head>
<body>
<div class="">
<h3 class="Mt(20px)"><span>Cash Flow Statement</span></h3>
<table class="W(100%) Bdcl(c)">
<tbody>
<tr class="Bxz(bb) H(36px) BdY Bdc($seperatorColor)">
<td class=
"Pos(st) Start(0) Bgc($lv2BgColor) fi-row:h_Bgc($hoverBgColor) Pend(10px) Miw(140px)">
<span>Operating Cash Flow</span> <!-- -->(ttm)</td>
<td class="Fw(500) Ta(end) Pstart(10px) Miw(60px)">
17.13M</td>
</tr>
<tr class="Bxz(bb) H(36px) BdB Bdbc($seperatorColor)">
<td class=
"Pos(st) Start(0) Bgc($lv2BgColor) fi-row:h_Bgc($hoverBgColor) Pend(10px)">
<span>Levered Free Cash Flow</span> <!-- -->(ttm)</td>
<td class="Fw(500) Ta(end) Pstart(10px) Miw(60px)">
-210.33M</td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
Stopped for now.
There is a lot of free fundamental data available on web pages. We already have experience with a web scrapper: 162c94410c603a488efedf407451939db3be676c
The code above was written especially for Investing.com. Can we have something more generic to parse table-like data?
The requirements are to be able to parse table elements, but also eventually pseudo-tables made of div span constructs.