As a user, I would like to be able to select / upload a document, have the text content of the document extracted, chunked, and then summarized appropriately.
I would like to be able to do this with multiple document types, including word document formats, PDF, and EPUB.
As a user of the GUI, I should be able to access the UI of the application, and select a file for upload, at which point the file is uploaded and parsed, confirmed to be an appropriate/matching file type, and the text then extracted, (chunked if the arg is passed) and finally summarized.
As a user of the CLI, I should be able to pass a command line argument that allows me to specify a single file, or a collection of files, as listed in a text file, as input for summarization (with the option for chunking if the arg is passed).
I figure since we're 'just' shuffling text back and forth, why not throw in some other text formats as well, to make this even more handy of a tool for research and study.
~Tracking integration of Website text: Issue #43~ Solved.
Tracking integration of PDF documents: Issue #46
Tracking integration of epub files: #47
Tracking integration of Office doc files: Issue #44
As a user, I would like to be able to select / upload a document, have the text content of the document extracted, chunked, and then summarized appropriately.
I would like to be able to do this with multiple document types, including word document formats, PDF, and EPUB.
As a user of the GUI, I should be able to access the UI of the application, and select a file for upload, at which point the file is uploaded and parsed, confirmed to be an appropriate/matching file type, and the text then extracted, (chunked if the arg is passed) and finally summarized.
As a user of the CLI, I should be able to pass a command line argument that allows me to specify a single file, or a collection of files, as listed in a text file, as input for summarization (with the option for chunking if the arg is passed).
I figure since we're 'just' shuffling text back and forth, why not throw in some other text formats as well, to make this even more handy of a tool for research and study.
~Tracking integration of Website text: Issue #43~ Solved.
Tracking integration of PDF documents: Issue #46
Tracking integration of epub files: #47
Tracking integration of Office doc files: Issue #44