jaluoma / pruju-ai

An AI teaching assistant that allows students to interact with the teacher's course materials
MIT License
38 stars 15 forks source link

Unzip compressed files and bypass unsupported files firstly then extract text. #5

Open fishfree opened 7 months ago

fishfree commented 7 months ago
raise exceptions.ExtensionNotSupported(ext)
textract.exceptions.ExtensionNotSupported: The filename extension .zip is not yet supported by
textract. Please suggest this filename extension here:

    https://github.com/deanmalmgren/textract/issues

Available extensions include: .csv, .doc, .docx, .eml, .epub, .gif, .htm, .html, .jpeg, .jpg, .json, .log, .mp3, .msg, .odt, .ogg, .pdf, .png, .pptx, .ps, .psv, .rtf, .tab, .tff, .tif, .tiff, .tsv, .txt, .wav, .xls, .xlsx
jaluoma commented 7 months ago

Good idea! Definitely worth doing at some point (I'll leave the issue open), but I'd also be happy to accept a PR.