lamalab-org / matextract-book

http://matextract.pub/
MIT License
22 stars 1 forks source link

biomass extraction case study #36

Closed kjappelbaum closed 1 month ago

kjappelbaum commented 2 months ago

I think @vgvinter and @MrtinoRG had some ideas - could you perhaps remind me what you planned to do?

MrtinoRG commented 2 months ago

The idea is to take an open-source article (.pdf), extract the text from it using Marker (since Nougat was used in other Notebooks), chunk, perform text classification to those chunks using cosine similarity and then compare different extractions using different advanced prompting techniques such as CoT and similar ones.

The main idea we would like to show is the prompting techniques since the only case of these prompts in the Notebooks is the 1-shot in the fine-tuning Notebook.

kjappelbaum commented 2 months ago

Cool! Marker/Nougat and co is still a tricky case (#35). So, I understand correctly that you already work on this, right?