Closed gmantri closed 4 months ago
Hi @gmantri the old .doc format is not supported sorry. Aside from converting files manually, you could:
steps
parameter in the Import API.examples
folder about adding custom decoders.@dluc - Thanks. Is there a list of file types supported by Kernel Memory. All I could find was this: https://github.com/microsoft/kernel-memory?tab=readme-ov-file#kernel-memory-km-and-sk-semantic-memory-sm and it only talks about the file types at a high level (e.g. Word instead of .docx and not .doc). Having this list will be really helpful.
The default list can be extrapolated from here https://github.com/microsoft/kernel-memory/blob/3d34260ae513af48030da9a56aa50b8e0162c6f8/service/Core/DataFormats/DependencyInjection.cs#L81
services.AddSingleton<IContentDecoder, TextDecoder>();
services.AddSingleton<IContentDecoder, MarkDownDecoder>();
services.AddSingleton<IContentDecoder, HtmlDecoder>();
services.AddSingleton<IContentDecoder, PdfDecoder>();
services.AddSingleton<IContentDecoder, ImageDecoder>();
services.AddSingleton<IContentDecoder, MsExcelDecoder>();
services.AddSingleton<IContentDecoder, MsPowerPointDecoder>();
services.AddSingleton<IContentDecoder, MsWordDecoder>();
using DI one can inject more decoders, that are automatically picked up by TextExtractionHandler
(https://github.com/microsoft/kernel-memory/blob/3d34260ae513af48030da9a56aa50b8e0162c6f8/service/Core/Handlers/TextExtractionHandler.cs#L43)
For each file, the handler loops through the list of decoders, asking each one if they support the current file format:
var decoder = this._decoders.LastOrDefault(d => d.SupportsMimeType(uploadedFile.MimeType));
if (decoder is not null) ...
Thank you!
Context / Scenario
We have some Microsoft Word documents that are in old format (
.doc
). What we are seeing is that when try to use those documents, Kernel Memory fails to answer the questions from those documents. When we convert those documents to.docx
format, everything works great.microsoft.docx
What happened?
Our expectation was that both
.doc
and.docx
files should work but that is not happening..doc
files do not work but.docx
file work.Importance
a fix would make my life easier
Platform, Language, Versions
Microsoft.KernelMemory.Core - 0.61.240524.1 Microsoft.SemanticKernel - 1.15.0
Relevant log output
No response