Pull Request - Epic 7.2 - Extended File Support for Flashcard Generator for Dynamo
By: Synthwave Sentinels
Summary
Extended file support for flashcard generator for the following file types:
PDF
CSV
TXT
MD
URL (Such as Wikipedia or other source with info.)
PPTX
DOCX
XLS / XLSX
XML
GOOGLE DOCS
GOOGLE SHEETS
GOOGLE SLIDES
GOOGLE PDF
IMAGES (PNG / JPG / JPEG)
Changes
Added new document loaders for extending the file support by creating a File Handler class that downloads the temporary file with an unique name (UUID imp.)
Refactor the existent metadata.py from dynamo due the hard-coded implementation of the previous youtube_url attribute.
Updated core.py for extending the file support with the file_url and file_type attributes.
Created a new util file allowed_file_extensions_dynamo.py for a robust approach in the file type management.
Created new Error Classes for improving the error mapping process.
Created new prompt templates for managing text, youtube videos and structured/tabular data in a personalized way.
Enabled a Computer Vision approach for retrieving key concepts from images as an innovative proposal for the project.
Testing
Each document loader was tested using pytest (Managing appropriate scenarios and edge cases).
Results
All document loaders are working appropriately with the optimal results.
Notes
For Google Docs, Slides, Sheets and PDF files:
Those files need to be shared as public in Google Drive.
They need to be uploaded to Google Drive. If a Google File is created in Google Drive, it needs to be downloaded and then uploaded to be detected. This was the most appropriate approach that we discovered as the GoogleDriver loader from LangChain works with OAuth2, which is not appropriate for production deployment.
How to Test
Clone the repo in your local
Create and activate virtual environment
Use pip install -r requirements.txt to install required libraries.
Create .env file with ENV_TYPE, GCP_PROJECT_ID and GOOGLE_API_KEY fields. Env type is dev, gcp_project_id is your project id from cloud console project, google_api_key is your api key from AI studio.
Then type ./local-start.sh to start the application.
Add sample requests and responses for each file type as mentioned above in screenshots and test it.
(Thank you so much AI Avengers for providing your PR template)
Pull Request - Epic 7.2 - Extended File Support for Flashcard Generator for Dynamo
By: Synthwave Sentinels
Summary
Extended file support for flashcard generator for the following file types:
Changes
File Handler
class that downloads the temporary file with an unique name (UUID imp.)file_url
andfile_type
attributes.allowed_file_extensions_dynamo.py
for a robust approach in the file type management.Testing
Each document loader was tested using pytest (Managing appropriate scenarios and edge cases).
Results
All document loaders are working appropriately with the optimal results.
Notes
For Google Docs, Slides, Sheets and PDF files:
How to Test
Clone the repo in your local
Create and activate virtual environment
Use pip install -r requirements.txt to install required libraries.
Create .env file with ENV_TYPE, GCP_PROJECT_ID and GOOGLE_API_KEY fields. Env type is dev, gcp_project_id is your project id from cloud console project, google_api_key is your api key from AI studio.
Then type ./local-start.sh to start the application.
Add sample requests and responses for each file type as mentioned above in screenshots and test it. (Thank you so much AI Avengers for providing your PR template)
References
Request Templates Docs
UPDATE: Multi-language support has been implemented in this PR: