theodo-group / LLPhant

LLPhant - A comprehensive PHP Generative AI Framework using OpenAI GPT 4. Inspired by Langchain
MIT License
767 stars 78 forks source link

PDF File Malformed UTF-8 characters, possibly incorrectly encoded #201

Open MuhammadAslam945 opened 1 month ago

MuhammadAslam945 commented 1 month ago

"message": "Malformed UTF-8 characters, possibly incorrectly encoded", "exception": "InvalidArgumentException",

$dataReader = new FileDataReader($file);
    //dd($dataReader);
    //$filePath =
    $documents = $dataReader->getDocuments();
    // dd($documents);
    // $documents = new Document($text);
    //($documents);
    $splittedDocuments = DocumentSplitter::splitDocuments($documents, 500);

    return $splittedDocuments;

    $embeddingGenerator = new OpenAI3SmallEmbeddingGenerator($config);
    $embeddedDocuments = $embeddingGenerator->embedText($text);

    return $embeddedDocuments;
    dd($embeddedDocuments);
MuhammadAslam945 commented 1 month ago

{ "message": "Malformed UTF-8 characters, possibly incorrectly encoded", "exception": "InvalidArgumentException", "file": "D:\RagLara\vendor\laravel\framework\src\Illuminate\Http\JsonResponse.php", "line": 90, "trace": [ { "file": "D:\RagLara\vendor\symfony\http-foundation\JsonResponse.php", "line": 49, "function": "setData", "class": "Illuminate\Http\JsonResponse", "type": "->" }, { "file": "D:\RagLara\vendor\laravel\framework\src\Illuminate\Http\JsonResponse.php", "line": 32, "function": "__construct", "class": "Symfony\Component\HttpFoundation\JsonResponse", "type": "->"

MuhammadAslam945 commented 1 month ago
if ($request->hasFile('file')) {
        $file = $request->file->getRealPath();
    } else {
        return 'file not found';
    }
    $dataReader = new FileDataReader($file);

    $documents = $dataReader->getDocuments();

    $splittedDocuments = DocumentSplitter::splitDocuments($documents, 800);

    return $splittedDocuments;
    $embeddingGenerator = new OpenAI3SmallEmbeddingGenerator($config);
    $embeddedDocuments = $embeddingGenerator->embedText($text);

    $memoryVectorStore = new MemoryVectorStore;
    $memoryVectorStore->addDocuments($embeddedDocuments);

    //Once the vectorStore is ready, you can then use the QuestionAnswering class to answer questions
    $qa = new QuestionAnswering(
        $memoryVectorStore,
        $embeddingGenerator,
        new OpenAIChat
    );

    $answer = $qa->answerQuestion('what is the secret of Alice?');

    // control the behavior of the underlying vector store, see VectorStoreBase::similaritySearch
    $answer = $qa->answerQuestion('Where does Alice live?', 2, ['type' => 'city']);
f-lombardo commented 1 month ago

"message": "Malformed UTF-8 characters, possibly incorrectly encoded", "exception": "InvalidArgumentException",

A fix related to wrong encoding in embedding texts has just been integrated. Could you please check if it solves your problem?

MuhammadAslam945 commented 4 weeks ago

Now I check It still it giving error with pdf and docs file for openAI and also if possible share Doctorine postgres sql example in detail I tried it out with PlaceEntity example but it give always through error

On Thu, Aug 15, 2024 at 2:06 PM Franco Lombardo @.***> wrote:

"message": "Malformed UTF-8 characters, possibly incorrectly encoded", "exception": "InvalidArgumentException",

A fix related to wrong encoding in embedding texts has just been integrated. Could you please check if it solves your problem?

— Reply to this email directly, view it on GitHub https://github.com/theodo-group/LLPhant/issues/201#issuecomment-2290934943, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATKIQKQVLLGZEF65XUWLFEDZRRVQBAVCNFSM6AAAAABMRFQV2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJQHEZTIOJUGM . You are receiving this because you authored the thread.Message ID: @.***>