second-opinion-ai / second-opinion

6 stars 0 forks source link

TEI/XML to JSONL conversion for ehanced Q&A generation #27

Open branhoff opened 4 months ago

branhoff commented 4 months ago

As a follow up from https://github.com/second-opinion-ai/second-opinion/issues/25, we need to convert our TEI xml files into json files which can be passed to an LLM to develop Q&A's.

Description:

This feature represents converting our TEI/XML files, produced by GROBID, into structured JSONL format. This transformation needs to maintain both the detailed content and the structural organization of the documents, including metadata (titles, authors, abstracts), section headings, bibliographic references, and the overall narrative flow.

Acceptance Criteria: