run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.53k stars 5.22k forks source link

[Bug]: Ingestion from Scratch #10076

Closed maherr13 closed 6 months ago

maherr13 commented 9 months ago

Bug Description

I Tried to follow the documentation to start ingestion from scratch using the following guidelines https://docs.llamaindex.ai/en/stable/examples/low_level/ingestion.html

Version

llama-index==0.9.21

Steps to Reproduce

from qdrant_client import QdrantClient
import qdrant_client
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index import Document
from typing import Any
import pandas as pd
from models import lodestone # custom embedding

df = pd.read_csv('articles_test.csv')

nodes = [
    Document(
        text='title: ' + str(row['article_title']).lower() + '\nbody: ' + str(row['article_text']).lower(),
        metadata={

            "keyword": row['keyword'],
            "time": row['time'],

        }) for i, row in df.iterrows()

]

embed_model = lodestone()
# Define the Qdrant client
qdrant_client = QdrantClient("localhost", port=6333)
vector_store = QdrantVectorStore(client=qdrant_client, collection_name="db")

for node in nodes:
    node_embedding = embed_model.get_text_embedding(
        node.get_content(metadata_mode="all")
    )
    node.embedding = node_embedding

vector_store.add(nodes)

Relevant Logs/Tracbacks

    vector_store.add(nodes)
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/llama_index/vector_stores/qdrant.py", line 214, in add
    points, ids = self._build_points(nodes)
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/llama_index/vector_stores/qdrant.py", line 191, in _build_points
    [
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/llama_index/vector_stores/qdrant.py", line 192, in <listcomp>
    rest.PointStruct(id=node_id, payload=payload, vector=vector)
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 2 validation errors for PointStruct
vector
  value is not a valid list (type=type_error.list)
vector
  value is not a valid dict (type=type_error.dict)
dosubot[bot] commented 9 months ago

🤖

Hello @maherr13! I'm Dosu, an AI bot here to assist you while we wait for a human maintainer to join us. I can help you crack bugs, answer your questions, or guide you on your journey as a contributor. I've taken a look at the issue you posted and will provide a detailed response shortly. Hang tight!

logan-markewich commented 9 months ago

@maherr13 your embedding model must be returning a List[Float] when running embed_model.get_text_embedding()

Seems like your custom model is not doing that

maherr13 commented 9 months ago

I printed one of the nodes embedding and i got

[-3.64547828e-03  1.25213182e-02  1.75225399e-02 -3.63743529e-02
  1.61171518e-02  2.85976473e-03  3.01206000e-02 -4.00308892e-02
 -3.21327634e-02  3.37168016e-02 -6.01633266e-03  1.42307216e-02
  9.34562180e-03  1.77354142e-02 -6.25860468e-02 -2.06345953e-02
 -3.62374671e-02  1.03315674e-02 -5.87749779e-02  6.26274012e-03
  7.28839412e-02 -1.45891067e-02 -4.24632477e-03  1.03928717e-02
  5.57924844e-02  1.84429754e-02  3.24297957e-02  1.87436324e-02
 -1.42270057e-02 -1.56885553e-02 -5.05669080e-02  5.14570251e-02
  3.50821614e-02 -9.84218642e-02  1.41653987e-02 -2.41554389e-03
 -7.33317668e-03 -2.47879457e-02  9.36289597e-03  4.84034903e-02
 -4.70935553e-02  5.47808819e-02 -5.02577657e-03 -5.84598854e-02
  1.85529757e-02  6.38756528e-02 -7.38677755e-02 -2.36368366e-02
 -5.28002121e-02 -1.96439866e-02  2.47482341e-02  9.62138474e-02
 -2.73084659e-02  2.22350508e-02  1.39288642e-02  2.90864818e-02
  6.75761625e-02  6.40139207e-02  4.66703102e-02  2.07377244e-02
 -4.02401909e-02  3.93190794e-02 -3.45543101e-02 -7.80336279e-03
 -5.16297258e-02 -3.68899330e-02  1.01999659e-02  4.57523875e-02
  2.28632092e-02  6.91770669e-03  1.27386944e-02  9.10383463e-03
  1.01292171e-02  3.59192453e-02  5.17023169e-02  7.69870803e-02
 -1.83911771e-02 -2.06280276e-02 -9.90242325e-03 -3.20705026e-02
  1.25311371e-02 -2.56022792e-02 -2.10210565e-03 -5.77221848e-02
  1.63930524e-02 -4.55147363e-02 -3.22092436e-02 -9.13441181e-03
 -1.63621958e-02  1.06033981e-02  2.19550710e-02  1.28156506e-02
 -1.07745768e-03  3.59389223e-02 -3.94122489e-03 -3.32513675e-02
 -2.61264928e-02 -3.18978466e-02 -1.33737801e-02  2.64440873e-03
  4.23555076e-02  1.15985740e-02 -1.17347846e-02 -3.26161608e-02
  4.07627784e-02  1.17025999e-02 -3.61393802e-02 -4.87026647e-02
  3.81733105e-02 -1.38755692e-02 -5.41725568e-02 -3.55760567e-02
  2.30747778e-02 -2.20938046e-02  2.39178464e-02  3.76181528e-02
  1.77335925e-02  1.50377327e-03  3.06087695e-02  9.15710907e-03
  1.95260663e-02  4.91462089e-02 -1.47186883e-03 -8.69552977e-03
 -4.21416126e-02 -2.28013378e-02  2.67046299e-02  9.71806888e-03
 -2.86353398e-02  8.79631657e-03 -2.89634615e-02  1.19225718e-02
  1.29791657e-02 -2.50536334e-02 -7.53746927e-02  8.63890629e-03
  1.22812821e-03  3.01549118e-02  1.30181471e-02  2.33103931e-02
  3.89723293e-02  4.03185049e-03 -4.40802686e-02  1.74740478e-02
 -2.85158586e-02 -2.42598560e-02 -6.17145598e-02 -1.65980067e-02
 -2.02230774e-02  2.25222409e-02 -1.53113147e-02 -1.98431592e-02
 -8.81872103e-02 -2.60080099e-02  1.26845259e-02  5.53160049e-02
  3.94460112e-02 -8.73306859e-03  1.16693331e-02 -7.38504305e-02
 -1.94768347e-02  2.83457176e-03 -2.87812809e-03 -4.56333049e-02
  3.55333500e-02  1.73945241e-02 -7.81603996e-03  7.77495056e-02
 -4.79174741e-02 -3.61820050e-02 -4.21717428e-02 -5.06486744e-03
 -3.63422073e-02 -1.96490251e-02  5.38058095e-02 -3.55676599e-02
  5.44462427e-02 -1.55458618e-02 -3.66014540e-02  9.96804982e-03
  3.24199274e-02 -2.67180633e-02  2.19040308e-02 -1.30549641e-02
 -4.24603596e-02  6.55140728e-02 -1.07724769e-02  6.41481280e-02
  2.62951502e-03 -7.43400902e-02 -5.52698178e-03  5.49289616e-05
  1.18044845e-03  1.82584543e-02  1.67129096e-02 -2.57925615e-02
 -4.42686826e-02  6.18928708e-02 -1.82274040e-02 -3.39920558e-02
 -1.13031464e-02 -1.08265709e-02 -1.99796930e-02 -1.45413233e-02
  1.02951892e-01 -4.42503057e-02 -2.45488761e-03  9.97644197e-03
  1.66082494e-02 -2.11334750e-02  2.73387469e-02 -8.16206485e-02
 -2.50162780e-02  5.69651835e-02 -6.36227280e-02  1.18617052e-02
 -5.24270572e-02 -3.68932895e-02 -3.23038846e-02  1.83663648e-02
 -4.05450910e-02  2.74323039e-02 -5.10856183e-03 -2.91628167e-02
 -2.94492040e-02  2.51760464e-02 -7.66699314e-02  1.93187278e-02
  1.18355229e-02 -1.96797587e-02 -6.88428350e-04 -1.70596596e-02
 -1.05653271e-01  5.92099987e-02 -5.93261756e-02 -5.24815656e-02
 -1.44640785e-02  2.49180105e-02  7.26345927e-03 -6.54540062e-02
 -2.83705238e-02 -4.56054956e-02 -2.32541021e-02 -1.71008352e-02
  2.76037734e-02  2.86313072e-02 -2.76724696e-02  1.98166892e-02
  5.31595945e-02 -4.22821082e-02 -1.76103339e-02  2.44817361e-02
  3.39578129e-02 -7.91684072e-03 -6.45736605e-02  2.04984233e-01
 -7.59122754e-03 -2.11189650e-02 -8.24923720e-03  8.78034253e-03
  3.97066511e-02 -5.18247113e-03  5.69745479e-03 -2.38603409e-02
 -7.20375180e-02 -3.21799051e-03 -4.98789698e-02 -4.89702486e-02
 -4.49282639e-02 -6.81649195e-03  2.17727795e-02 -1.28603699e-02
  6.74166856e-03  4.83017974e-03 -1.43669201e-02  9.25321225e-03
 -1.54095134e-02  1.25011941e-02  3.17616835e-02  4.58958466e-03
 -1.12805106e-02 -1.07066585e-02  2.61144508e-02 -1.81811396e-02
 -6.11132532e-02  2.27464251e-02  3.79795805e-02  1.17740473e-02
 -1.25902528e-02  3.35919932e-02 -1.57888923e-02  1.31449141e-02
  1.00140227e-02 -2.21017897e-02  4.56432030e-02 -9.00286622e-03
 -3.60054057e-03  3.85654680e-02 -2.34880988e-02 -5.93791343e-02
  5.51515259e-02  8.89017154e-03  1.61549989e-02  8.34697857e-03
  3.32498997e-02 -1.63470171e-02  7.01973913e-03 -3.40338014e-02
  1.80808716e-02 -3.16333584e-02  2.77708191e-02 -5.72179779e-02
 -5.36381491e-02 -7.08995610e-02 -3.31158936e-02  3.08646373e-02
  7.84297287e-02  5.89615665e-03 -1.43587738e-02 -2.37109307e-02
 -2.74693826e-04 -3.52840088e-02 -4.40042801e-02 -3.77140567e-02
  5.45195267e-02 -3.43752205e-02 -8.46172497e-03  3.01787443e-02
 -4.66806665e-02 -5.52300029e-02  3.70514989e-02  6.24804199e-02
  2.76098996e-02 -1.17266439e-02 -1.70199908e-02 -1.45210391e-02
  4.31835139e-03 -2.82107629e-02  6.10147938e-02  8.87947436e-03
  3.93033959e-02  4.04259339e-02  9.69103444e-03 -1.57210752e-02
  1.65764038e-02 -3.23775061e-03  7.09511414e-02 -4.67336252e-02
  7.80399144e-03  1.48962718e-03  1.84341744e-02 -1.86148696e-02
  2.77370885e-02 -5.53805679e-02 -2.36176439e-02  1.60861555e-02
 -2.15961169e-02  3.59660201e-02 -2.67770719e-02  4.52693105e-02
  2.33914498e-02 -3.56672257e-02  4.55124788e-02  2.64678840e-02
 -3.13932858e-02  1.36316335e-02 -1.99382529e-02 -4.38912623e-02
  1.18621578e-02 -4.59848493e-02 -9.85693280e-03  6.46716580e-02
 -2.51336247e-02 -3.92842442e-02 -1.25645688e-02 -3.66655029e-02
 -2.51113693e-03 -1.22154756e-02  5.11886701e-02 -2.84031909e-02
  9.50117968e-03 -1.07136210e-02 -4.91385497e-02  1.92532712e-03
  1.65915564e-02 -2.23769024e-02  6.26340136e-03 -8.29396863e-03
  1.85771435e-02 -7.72915734e-03 -1.40996492e-02 -2.33205892e-02
  2.41798535e-02  3.76972295e-02 -5.98267280e-02 -8.07460770e-02
 -1.87639706e-02  2.10300423e-02 -4.39171717e-02  1.29334470e-02
 -1.80778261e-02 -1.04578547e-02  5.11969719e-03  3.44369337e-02
  1.88180041e-02 -1.02266679e-02  2.04167217e-02 -6.28669187e-02
  3.73468734e-02 -2.98788995e-02  4.33007590e-02  4.12500016e-02
 -5.50353937e-02 -5.29428050e-02  8.15502182e-03  3.57779264e-02
 -3.15040834e-02 -2.54899282e-02 -6.66040704e-02  1.46406386e-02
 -6.93883374e-02 -5.04248636e-03 -3.41102965e-02  2.90077943e-02
  2.97261961e-02  8.94919131e-03 -5.61035648e-02  6.79204240e-02
 -2.55711023e-02  9.69850831e-03 -2.18656603e-02 -3.35995894e-04
  1.75840314e-02 -1.62065029e-02  4.47277687e-02  4.75001289e-03
 -2.60545686e-02  1.93453636e-02  3.02741807e-02 -7.86272250e-03
 -6.41093822e-03 -1.56354588e-02 -1.79687552e-02  3.77138928e-02
  6.80867024e-03 -5.67769818e-02  3.74938617e-03 -2.40037311e-02
 -1.04615120e-02  1.43268667e-02 -7.52602611e-03  1.01523772e-02
  2.61872299e-02 -4.78236601e-02 -2.38999967e-02  8.58633965e-03
 -1.76330414e-02  1.47759160e-02 -6.09118007e-02  3.79493050e-02
 -4.74533364e-02  3.00470497e-02 -3.04936059e-03  3.74177881e-02
  2.04944704e-03  6.14041276e-02  3.45725119e-02  9.48848668e-03
  2.99538057e-02  4.55446541e-03  1.05481312e-01 -2.54764478e-03
 -4.15276550e-02  5.09202899e-03  2.00475678e-02 -2.13579945e-02
 -3.00484728e-02  1.43312160e-02  2.91005475e-03  1.14593320e-02
  2.81425677e-02 -5.71363978e-03 -7.40063190e-02 -4.05915268e-02
 -2.93698739e-02  5.62757663e-02  1.60034411e-02  9.95636638e-03
 -1.56875942e-02  3.98048013e-02 -3.33460905e-02 -6.32024631e-02
  1.99779822e-03 -1.36271343e-02  7.93177336e-02 -1.97503418e-02
 -2.79942900e-03 -1.03890486e-02 -1.53217167e-02 -2.46544518e-02
  2.19422844e-04  3.56131420e-02  1.03065148e-02 -4.11818619e-04
 -4.36857007e-02  1.10323681e-02  6.12451024e-02  6.38169125e-02
 -8.79328512e-03  4.06624600e-02 -2.35531945e-02  3.99416797e-02
  3.59343328e-02  2.57273894e-02 -2.64424272e-02 -9.73256305e-03
  5.14670508e-03  1.83104966e-02  2.15425622e-02  6.83952197e-02
 -2.15123668e-02 -2.20591761e-02 -1.40306298e-02  3.49959806e-02
 -3.28956847e-03 -3.93537134e-02  1.23865176e-02 -8.73447489e-03
 -2.37639118e-02 -6.72885869e-03 -9.97311436e-03  6.03407882e-02
  1.06373476e-02  1.01712411e-02  3.94298360e-02  3.13195735e-02
  7.02257501e-03  5.45586236e-02  2.60458309e-02 -3.51788811e-02
  3.56346928e-03 -1.78007782e-02 -3.59287448e-02  6.11044578e-02
  2.94836680e-03  1.24905799e-02 -3.01653463e-02 -8.21006298e-02
 -5.02705202e-02  3.44776288e-02  4.20333818e-02 -2.78914254e-02
  3.39982212e-02 -3.24635096e-02  9.11139045e-03  9.02393460e-02
  1.87112316e-02  1.11248391e-02 -6.57129139e-02  6.98433146e-02
  4.52168547e-02  6.44608662e-02  3.60732861e-02  4.05726731e-02
 -3.82961370e-02  1.73863247e-02  1.02811912e-02  7.45093403e-03
 -2.45681610e-02  5.06828539e-03  9.98113211e-03  9.66444332e-03
  1.37232123e-02  3.99209978e-03 -5.22546545e-02  1.67275760e-02
  1.65522043e-02  4.24985960e-02  8.47407132e-02 -1.17632486e-02
 -2.19865446e-03 -1.28902690e-02  4.11166325e-02 -1.57686155e-02
  1.13896523e-02 -3.46299261e-02  2.77104843e-02  2.10030638e-02
  3.44074741e-02  3.54765803e-02 -1.30914701e-02 -1.74650159e-02
 -8.97440594e-03  1.38312588e-02  4.78587262e-02 -1.12522254e-03
  5.04951440e-02 -2.50144545e-02  1.08731212e-03 -5.53802960e-02
  2.33625844e-02  6.99594803e-03  2.84171328e-02  7.02727810e-02
  2.83528678e-02  7.55043328e-02 -1.65573452e-02  8.32970778e-04
  1.11773640e-01 -4.66845557e-02 -4.68158759e-02  2.03929953e-02
  3.22408602e-02 -1.21943261e-02  1.62536781e-02  3.37236300e-02
  3.16884592e-02 -1.76295117e-02 -5.64579256e-02 -1.23977419e-02
  3.48175503e-02  1.03988804e-01 -4.62096892e-02  7.02658342e-03
  2.57201940e-02  1.40416110e-02 -3.23827453e-02 -8.57047678e-04
  1.11302258e-02  1.88553799e-02  4.58415970e-02 -6.24246374e-02
 -3.01422384e-02 -2.79217795e-03 -5.44812009e-02 -1.22187957e-02
 -2.71052830e-02  1.56352762e-02  1.36939306e-02  2.45138742e-02
  2.53350977e-02  5.22923702e-03  1.73339690e-03 -9.50760767e-03
  2.93788537e-02  1.56893693e-02 -2.15874705e-03 -7.78756198e-03
 -2.16005064e-04  9.59811267e-04  4.52780277e-02 -1.37140071e-02
  2.07706466e-02 -2.55420078e-02 -1.40083730e-01  1.27953934e-02
  5.46364374e-02 -1.44561045e-02 -2.32806392e-02  3.96131687e-02
  2.76498552e-02 -6.02446729e-03  4.14198585e-04 -1.13005433e-02
 -7.26730982e-03  6.26687333e-02  3.60631868e-02  9.40909609e-03
  3.52353528e-02 -2.74006487e-03 -2.25703493e-02  1.61791276e-02
 -1.18373688e-02 -1.53232692e-02  3.46199200e-02 -5.51446760e-03
 -9.42378026e-03 -5.13079949e-02 -4.03694995e-02  3.16658057e-02
  3.73257510e-03  1.83451399e-02 -1.61553901e-02 -6.69629276e-02
 -4.92594466e-02 -3.51896067e-03 -6.45856783e-02 -3.68169434e-02
  3.53627503e-02 -4.54689078e-02  2.91138794e-02  4.23489027e-02
  2.78038066e-02  1.75744817e-02  1.33207515e-02 -2.75395717e-02
 -1.92884393e-02  3.48363966e-02 -4.94154021e-02 -5.76495612e-03
 -3.64364241e-03 -9.27321836e-02  8.63698684e-03 -3.57564203e-02
  1.00336140e-02  1.06674321e-02  2.31317319e-02  2.69891620e-02
 -2.90148798e-02  2.71858331e-02  2.76967529e-02 -1.24475705e-02
 -5.62758483e-02  2.42439304e-02  5.07775359e-02  2.49868408e-02
 -1.44713856e-02  9.81005561e-03 -2.86038946e-02  1.01221213e-02
  5.15413806e-02  3.81167047e-02  1.42329047e-02  1.63196921e-02
 -4.63443175e-02 -2.17133779e-02  1.50677428e-01 -2.30988916e-02
  2.49575078e-02  6.00662343e-02 -1.73476543e-02 -1.38736606e-04
 -9.01875831e-03  5.34207411e-02  1.03019420e-02  3.35409231e-02
 -4.44787601e-03 -2.88510341e-02  3.17322351e-02 -1.07297469e-02
 -5.10189980e-02 -1.97320362e-03 -4.00738940e-02  5.31280274e-03
 -1.38851546e-03  3.45449001e-02  3.43115558e-03 -1.71568047e-03
 -3.40039805e-02 -6.16438454e-03 -7.79603282e-03  5.76023422e-02
 -3.48003842e-02  1.24218808e-02  1.53770028e-02 -4.69806120e-02
  1.91097334e-02  3.76048777e-03 -8.55100807e-03 -8.56708512e-02
 -6.58981793e-04 -2.24240460e-02 -8.23956169e-03 -1.36109279e-03
 -7.64572918e-02  6.31988570e-02 -2.94980966e-02  3.05151716e-02]
maherr13 commented 9 months ago

should i add like if condition to check each embedding to be list[float] ?

logan-markewich commented 9 months ago

That looks like list[numpy float] to me 👀 Normal python floats do not have the exponent syntax

logan-markewich commented 9 months ago

probably you need embedding.tolist()

maherr13 commented 9 months ago

changing the code to

for node in nodes:
    node_embedding = embed_model.get_text_embedding(
        node.get_content(metadata_mode="all")
    )
    node.embedding = node_embedding.tolist()

resulted in new error

Traceback (most recent call last):
  File "/home/administrator/nlp-deploy/projects/llm_db/ingest.py", line 142, in <module>
    vector_store.add(nodes)
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/llama_index/vector_stores/qdrant.py", line 216, in add
    self._client.upsert(
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/qdrant_client/qdrant_client.py", line 987, in upsert
    return self._client.upsert(
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/qdrant_client/qdrant_remote.py", line 1300, in upsert
    http_result = self.openapi_client.points_api.upsert_points(
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/qdrant_client/http/api/points_api.py", line 1439, in upsert_points
    return self._build_for_upsert_points(
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/qdrant_client/http/api/points_api.py", line 738, in _build_for_upsert_points
    return self.api_client.request(
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/qdrant_client/http/api_client.py", line 74, in request
    return self.send(request, type_)
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/qdrant_client/http/api_client.py", line 97, in send
    raise UnexpectedResponse.for_response(response)
qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 400 (Bad Request)
Raw response content:
b'{"status":{"error":"Format error in JSON body: expected value at line 1 column 35919"},"time":0.0}'
logan-markewich commented 9 months ago

is tolist() actually doing it's job properly? Thats a very hard to debug error 😓

logan-markewich commented 9 months ago

It could be related to the metadata on the node as well, not sure if row['time'] is a string/int or datetime object

maherr13 commented 9 months ago

Thanks for mentioning the row['time'] it no longer give me error. one last question after ingesting the data into qdrant, how can i call the vector store which in the qdrant collection directory so that i can do some retrieval tasks ?

logan-markewich commented 9 months ago

@maherr13 you'll want to do

from llama_index import ServiceContext, VectorStoreIndex

ctx = ServiceContext.from_defaults(llm=None, embed_model=embed_model)

index = VectorStoreIndex.from_vector_store(vector_store, service_context=ctx)
retriever = index.as_retriever(similarity_top_k=2)
nodes = retriever.retrieve("text")
logan-markewich commented 9 months ago

If you provide an LLM, you can also use as_query_engine() to perform full RAG

maherr13 commented 9 months ago

If i may reopen as the issue is related to the topic, My data ingestion is done with no errors, I made query script as the following :

embedding_model = lodestone()
llm = llama2()

qdrant_client = QdrantClient("localhost", port=6333)
vector_store = QdrantVectorStore(client=qdrant_client, collection_name="test")

s_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embedding_model,

)

index = VectorStoreIndex.from_vector_store(vector_store, service_context=s_context)

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=2,
)

response_synthesizer = get_response_synthesizer(
    response_mode="compact",
    service_context=s_context,
    use_async=False,
    streaming=False,
)

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.5)
    ]
) 

the embedding step as mentioned

for node in nodes:
    node_embedding = embed_model.get_text_embedding(
        node.get_content(metadata_mode="all")
    )

    node.embedding = np.array(node_embedding).tolist()

and i got the following error

Traceback (most recent call last):
  File "/home/administrator/nlp-deploy/projects/llm_db/query.py", line 87, in <module>
    response = query_engine.query(query_data)
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/llama_index/core/base_query_engine.py", line 30, in query
    return self._query(str_or_query_bundle)
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/llama_index/query_engine/retriever_query_engine.py", line 170, in _query
    nodes = self.retrieve(query_bundle)
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/llama_index/query_engine/retriever_query_engine.py", line 126, in retrieve
    nodes = self._retriever.retrieve(query_bundle)
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/llama_index/core/base_retriever.py", line 54, in retrieve
    nodes = self._retrieve(query_bundle)
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 88, in _retrieve
    return self._get_nodes_with_embeddings(query_bundle)
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 164, in _get_nodes_with_embeddings
    query_result = self._vector_store.query(query, **self._kwargs)
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/llama_index/vector_stores/qdrant.py", line 468, in query
    response = self._client.search(
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/qdrant_client/qdrant_client.py", line 340, in search
    return self._client.search(
  File "/home/administrator/anaconda3/envs/llm_db/lib/python3.9/site-packages/qdrant_client/qdrant_remote.py", line 476, in search
    search_request=models.SearchRequest(
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 770 validation errors for SearchRequest
vector
  value is not a valid dict (type=type_error.dict)
vector
  value is not a valid dict (type=type_error.dict)
vector -> 0
  value is not a valid float (type=type_error.float)
vector -> 1
  value is not a valid float (type=type_error.float)
vector -> 2
  value is not a valid float (type=type_error.float)
vector -> 3
  value is not a valid float (type=type_error.float)
vector -> 4
  value is not a valid float (type=type_error.float)
vector -> 5

.....
vector -> 754
  value is not a valid float (type=type_error.float)
vector -> 755
  value is not a valid float (type=type_error.float)
vector -> 756
  value is not a valid float (type=type_error.float)
vector -> 757
  value is not a valid float (type=type_error.float)
vector -> 758
  value is not a valid float (type=type_error.float)
vector -> 759
  value is not a valid float (type=type_error.float)
vector -> 760
  value is not a valid float (type=type_error.float)
vector -> 761
  value is not a valid float (type=type_error.float)
vector -> 762
  value is not a valid float (type=type_error.float)
vector -> 763
  value is not a valid float (type=type_error.float)
vector -> 764
  value is not a valid float (type=type_error.float)
vector -> 765
  value is not a valid float (type=type_error.float)
vector -> 766
  value is not a valid float (type=type_error.float)
vector -> 767
  value is not a valid float (type=type_error.float)

you asked me about the embedding class here its

class lodestone(BaseEmbedding):
    _model: INSTRUCTOR = PrivateAttr()
    _instruction: str = PrivateAttr()

    def __init__(
        self,
        instructor_model_name: str = "/home/administrator/llama2_vllm/lodestone-base-4096-v1",
        instruction: str = "",
        **kwargs: Any,
    ) -> None:
        self._model = INSTRUCTOR(instructor_model_name)
        self._instruction = instruction
        super().__init__(**kwargs)

    @classmethod
    def class_name(cls) -> str:
        return "instructor"

    async def _aget_query_embedding(self, query: str) -> List[float]:
        return self._get_query_embedding(query)

    async def _aget_text_embedding(self, text: str) -> List[float]:
        return self._get_text_embedding(text)

    def _get_query_embedding(self, query: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, query]])
        return embeddings[0]

    def _get_text_embedding(self, text: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, text]])
        return embeddings[0]

    def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
        embeddings = self._model.encode(
            [[self._instruction, text] for text in texts]
        )
        return embeddings

I made sure the embedding output is List[Float] and the ingest script finished with no errors and the db is added to the qdrant collection

logan-markewich commented 9 months ago

I tried this and it worked fine. Note that I moved the tolist() to the embedding class instead, to cover all the methods

from InstructorEmbedding import INSTRUCTOR
from typing import Any, List

from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    ServiceContext,
)
from llama_index.bridge.pydantic import PrivateAttr
from llama_index.embeddings.base import BaseEmbedding
from llama_index.node_parser import SentenceSplitter
from llama_index.retrievers import VectorIndexRetriever
from llama_index.response_synthesizers import get_response_synthesizer
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.postprocessor import SimilarityPostprocessor
from llama_index.vector_stores import QdrantVectorStore

class lodestone(BaseEmbedding):
    _model: INSTRUCTOR = PrivateAttr()
    _instruction: str = PrivateAttr()

    def __init__(
        self,
        instructor_model_name: str = "hkunlp/instructor-large",
        instruction: str = "Represent the text for retrieval:",
        **kwargs: Any,
    ) -> None:
        self._model = INSTRUCTOR(instructor_model_name)
        self._instruction = instruction
        super().__init__(**kwargs)

    @classmethod
    def class_name(cls) -> str:
        return "instructor"

    async def _aget_query_embedding(self, query: str) -> List[float]:
        return self._get_query_embedding(query)

    async def _aget_text_embedding(self, text: str) -> List[float]:
        return self._get_text_embedding(text)

    def _get_query_embedding(self, query: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, query]]).tolist()
        return embeddings[0]

    def _get_text_embedding(self, text: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, text]]).tolist()
        return embeddings[0]

    def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
        embeddings = self._model.encode(
            [[self._instruction, text] for text in texts]
        ).tolist()
        return embeddings

embedding_model = lodestone(embed_batch_size=2)
llm = None

from qdrant_client import QdrantClient

qdrant_client = QdrantClient(":memory:")
vector_store = QdrantVectorStore(client=qdrant_client, collection_name="test")

service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embedding_model,
)

documents = SimpleDirectoryReader("./docs/examples/data/paul_graham").load_data()
nodes = SentenceSplitter(chunk_size=512)(documents)

texts = [node.text for node in nodes]
embeddings = embedding_model.get_text_embedding_batch(texts)
for i, node in enumerate(nodes):
    node.embedding = embeddings[i]

vector_store.add(nodes)

index = VectorStoreIndex.from_vector_store(
    vector_store,
    service_context=service_context,
)

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=2,
)

response_synthesizer = get_response_synthesizer(
    response_mode="compact",
    service_context=service_context,
    use_async=False,
    streaming=False,
)

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.5)],
)

nodes = query_engine.retrieve("test query")
print(len(nodes))
dosubot[bot] commented 6 months ago

Hi, @maherr13,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. The issue you opened describes a bug encountered while trying to start ingestion from scratch using llama-index version 0.9.21. It seems that the bug occurs when attempting to add nodes to the Qdrant vector store, resulting in validation errors for the PointStruct. The issue has received comments from dosubot, logan-markewich, and yourself, discussing potential solutions and code changes to address the bug. The conversation also includes a detailed traceback of the error and code snippets illustrating the problem.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and cooperation. If you have any further questions or need assistance, feel free to reach out.