Closed scottkleinman closed 1 year ago
Progress report:
docx
and pdf
loading have been added to the new smart loader. Work on json(l)
and csv
formats is in progress.remove
component, and I'll push it as soon as I've written the documentation.Progress report:
json(l)
and csv
formats have been added through the dataset
.The remaining item in this issue is the general enhancement to the loader architecture. However, the issue was initially based on the older basic
module, rather than the more recent smart
and dataset
modules. So I'm going to close that and revisit enhancements to those modules separately.
Potential new loader features:
pdf
,docx
,jsonl
, andcsv
formatsIt's also worth considering whether the loader should be implemented through Python dataclasses or pydantic
BaseModel
.Since the current loader is implemented in the
io.basic
module, it would be easy to develop these in a separate module and perhaps merge them later on.