microsoft / TUTA_table_understanding

TUTA and ForTaP for Structure-Aware and Numerical-Reasoning-Aware Table Pre-Training
MIT License
98 stars 20 forks source link

How to convert spreadsheets to your JSON format? #3

Open eloukas opened 3 years ago

eloukas commented 3 years ago

Hi and thanks for uploading your code repo.

How can someone preprocess their spreadsheet and generate a JSON for it according to your format? https://github.com/microsoft/TUTA_table_understanding/blob/main/data/pretrain/spreadsheet/spreadsheet-sample.json

HaoAreYuDong commented 2 years ago

For spreadsheet tables, table detection is needed. You can train a table detection model using https://github.com/microsoft/TableSense. You can also watch this repo and we will publish the code in the near future.

izavits commented 2 years ago

Hi, and thanks for sharing this work! Is this spreadsheets data available somewhere? If not, could you upload a few files (original) and processed (JSON)? This would help understanding how to produce the JSON. (I have also checked the spreadsheet-sampe.json, but some additional samples would help). Thanks