paperswithcode / galai

Model API for GALACTICA
Apache License 2.0
2.67k stars 275 forks source link

Data creation scripts for markdown #82

Open shubhamagarwal92 opened 1 year ago

shubhamagarwal92 commented 1 year ago

Hi!

Thank you for open-sourcing the code. Could you please also provide scripts related to the GROBID library and storing in markdown format as mention in the Appendix of the paper as:

We use a modified version of the GROBID library for converting PDFs to text, as well as obtaining titles,
authors and citations. 

The final paper documents are stored in a markdown format, as opposed to full LaTeX. We use markdown as
the standard format for all documents in the corpus to support knowledge blending between sources. Papers
are citation processed, following the title-based approach