This pull request extends the existing functionality to support reading and processing multiple file types, including DOC, DOCX, PPT, PPTX, and CSV files. The changes ensure that these file types are dynamically handled based on their extensions without altering the existing metadata.
Changes Made:
Enhanced File Loaders:
Updated UploadFileLoader, BytesFileLoader, LocalFileLoader, and URLLoader classes to handle additional file types (DOC, DOCX, PPT, PPTX, CSV). Implemented functions to extract text content from these file types. Improved Error Handling:
Added detailed error handling in file loaders to provide informative error messages. Ensured that unsupported file types are correctly identified and handled. Dependencies:
Included necessary libraries (python-docx, pandas, python-pptx) for handling the new file types. Updated requirements.txt to include these libraries.
unit testing is left will do it soon and post the results
This pull request extends the existing functionality to support reading and processing multiple file types, including DOC, DOCX, PPT, PPTX, and CSV files. The changes ensure that these file types are dynamically handled based on their extensions without altering the existing metadata.
Changes Made: Enhanced File Loaders:
Updated UploadFileLoader, BytesFileLoader, LocalFileLoader, and URLLoader classes to handle additional file types (DOC, DOCX, PPT, PPTX, CSV). Implemented functions to extract text content from these file types. Improved Error Handling:
Added detailed error handling in file loaders to provide informative error messages. Ensured that unsupported file types are correctly identified and handled. Dependencies:
Included necessary libraries (python-docx, pandas, python-pptx) for handling the new file types. Updated requirements.txt to include these libraries.
unit testing is left will do it soon and post the results