Closed philschmid closed 10 months ago
Add support to provide a directory with multiple HTML files instead of a single file for "clipping". The idea would be to read the directory convert files to PDFs and then save them as a single dataset.jsonl
dataset.jsonl
Add support to provide a directory with multiple HTML files instead of a single file for "clipping". The idea would be to read the directory convert files to PDFs and then save them as a single
dataset.jsonl