os-climate / aicoe-osc-demo

This repository is the central location for the demos the ET data science team is developing within the OS-Climate project. This demo shows how to use the tools provided by Open Data Hub (ODH) running on the Operate First cluster to perform ETL, create training and inference pipelines.
Apache License 2.0
10 stars 24 forks source link

Add table extraction and curation to the deprecated directory #175

Open Shreyanand opened 2 years ago

Shreyanand commented 2 years ago

In our initial conversation with IDS folks, we found out that the table extraction and their models do not give good results, so we decided to focus on the text extraction notebooks. The table extraction and curation are still a part of the demo2 directory but we should add it to the deprecated directory to avoid confusion.

MichaelTiemannOSC commented 2 years ago

Before you deprecate it, please consider that when it comes to random ESG reports, all tables are all different.

When it comes to CDP reports, the tables are all very much the same shape, and there are thousands of reports to scan. Ditto EEI reports. Let's see whether table extract works well or poorly on highly consistent tables (CDP in its lane, EEI in its lane).