Closed sai-krishna-msk closed 1 year ago
Hey @sai-krishna-msk, it looks like your dataset has no tensors. You can create tensors using ds.create_tensor
. Do tell me if you need more help!
Hey @sai-krishna-msk, it looks like your dataset has no tensors. You can create tensors using
ds.create_tensor
. Do tell me if you need more help!
@FayazRahman , Thank you for swift response.
I'm sorry but i have never worked with deeplake package before, I am not aware of what the issue still is, Can you kindly tell me what i am missing(When you say my dataset does not have tensor, do you mean the GitHub repo i am working with has no code ?). If and when you have time can you please elaborate on that and also point me in the direction where i have to modify the code.
Your help is much appreciated
On a side note, I was able to make the code work,
So first I tried with my private repo's code(lets call it repo-1), It was throwing the error I specified above, So i tried to use another public repo(lets call it repo-2), but still it was not working, so i did some debugging and found out despite of me changing the URL to repo-2, The code was working with repo-1. but when i had deleted the gumroad
directory(Which the code creates to store repo files) the code is now working with repo-2.
Keeping the bug aside, I am still trying to figure out why the code did not work with repo-1.
I will post an update if I found out.
But if anyone else figures out, please let me know. Thank you in advance.
Had a new script where I ran this, and it worked
import deeplake
api_key = os.getenv("<deeplake_api>")
# create an empty "data store" on deeplake. overwrite=True so I could keep reusing it
ds = deeplake.empty('hub://<your organization from deeplake>/<whatever you want to call it>', token=api_key, overwrite=True)
# create tensors mimicking the output sample from github.py
ds.create_tensor("ids")
ds.create_tensor("metadata")
ds.create_tensor("embedding")
ds.create_tensor("text", htype="text")
IMO It's worth adding to the instructions, but I think what's going on here is that the github.py
scripts outputs tensors in the following layout ['ids', 'metadata', 'embedding', 'text']
, so you need to mimic that structure in your deeplake datastore.
Thank you @sanchitram1, I think that should fix it.
I could not figure out the issue but based on error messages it was clear that it was deeplake issue, So I swapped out Deeplake as a vector database with Pinecone.
It is currently working with pinecone, which I found to be much simpler to work with as compared to Deeplake(although I am sure there are reasonable tradeoffs between Deeplake and Pinecone)
Here is the working code of the same project but with pinecone, Pinecone version of Chat-with-Github
Note Hi @peterw , I have credited you in my repo, Please let me know if it is not suffice. I'll do the necessary
Following is the entire error thread
db.add_documents(texts)