Closed msha1026 closed 4 months ago
Hi @dans-msft @Adarsh-Ramanathan , please help to take a look of this issue. @msha1026 we currently only track prompt flow SDK/CLI issues here. For portal UI bugs, please create an OCV in portal here:
@msha1026, Vector DB Lookup is on the deprecation path, could you upgrade your flow to its replacement - the preview Index Lookup tool (not Vector Index Lookup, which is also on the same deprecation path), and let us know if the issue persists?
https://learn.microsoft.com/en-us/azure/ai-studio/how-to/prompt-flow-tools/index-lookup-tool
Thank you for the speedy replies!
I have tried the preview Index Lookup tool in AzureML studio, and that did resolve the issue. Thank you!
@Adarsh-Ramanathan Is this preview tool also supported in VSCode? If so, what packages need to be installed in order to also run this in the VSCode prompt flow extension?
For reference, this is my local dev environment:
VSCode prompt flow extension: 1.9.2
promptflow
: 1.4.1
promptflow-tools
: 1.1.0
promptflow-vectordb
: 0.2.3
@msha1026, you might need to install pymongo as well - this should be an extra, but at present, is a required dependency. This is a known bug that will be fixed with the upcoming release. If you run pf tool list
in your env, you should see a ModuleNotFoundError in the output complaining that pymongo wasn't found.
@Adarsh-Ramanathan is there a timeline on when the Vector DB Lookup tool will be officially deprecated? Is there also a timeline on the upcoming release for the dependency fix?
Also, thanks for the heads up on missing pymongo
. I was also missing azureml-rag[search-documents]
. This was the package list that I had to install in order to get the tool working in VSCode in case anyone has the same issues as I did:
promptflow[azure]
promptflow-tools
promptflow-vectordb
azureml-rag[cognitive-search]
pymongo
and my pip freeze resulted in the following:
aiohttp==3.9.1
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.2.0
async-timeout==4.0.3
attrs==23.2.0
azure-ai-ml==1.12.1
azure-common==1.1.28
azure-core==1.29.7
azure-identity==1.15.0
azure-mgmt-core==1.4.0
azure-search-documents==11.4.0b8
azure-storage-blob==12.19.0
azure-storage-file-datalake==12.14.0
azure-storage-file-share==12.15.0
azureml-dataprep==5.1.3
azureml-dataprep-native==41.0.0
azureml-dataprep-rslex==2.22.2
azureml-fsspec==1.3.0
azureml-rag==0.2.24.1
blinker==1.7.0
cachetools==5.3.2
cattrs==23.2.3
certifi==2023.11.17
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==2.2.1
colorama==0.4.6
cryptography==41.0.7
dataclasses-json==0.6.3
distro==1.9.0
dnspython==2.5.0
docutils==0.20.1
exceptiongroup==1.2.0
faiss-cpu==1.7.4
filelock==3.13.1
filetype==1.2.0
Flask==3.0.1
frozenlist==1.4.1
fsspec==2023.12.2
gitdb==4.0.11
GitPython==3.1.41
google-api-core==2.15.0
google-auth==2.27.0
google-search-results==2.4.1
googleapis-common-protos==1.62.0
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.2
httpx==0.26.0
idna==3.6
importlib-metadata==7.0.1
isodate==0.6.1
itsdangerous==2.1.2
jaraco.classes==3.3.0
Jinja2==3.1.3
jsonpatch==1.33
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
keyring==24.3.0
langchain==0.0.348
langchain-core==0.0.13
langsmith==0.0.83
MarkupSafe==2.1.4
marshmallow==3.20.2
mmh3==4.1.0
more-itertools==10.2.0
msal==1.26.0
msal-extensions==1.1.0
msrest==0.7.1
multidict==6.0.4
mypy-extensions==1.0.0
numpy==1.26.3
oauthlib==3.2.2
openai==1.10.0
opencensus==0.11.4
opencensus-context==0.1.3
opencensus-ext-azure==1.1.13
packaging==23.2
pandas==2.2.0
pillow==10.2.0
platformdirs==4.1.0
portalocker==2.8.2
promptflow==1.4.1
promptflow-tools==1.1.0
promptflow_vectordb==0.2.3
protobuf==4.25.2
psutil==5.9.8
pyarrow==14.0.2
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycparser==2.21
pydantic==2.5.3
pydantic_core==2.14.6
pydash==7.0.5
PyJWT==2.8.0
pymongo==4.6.1
python-dateutil==2.8.2
python-dotenv==1.0.1
pytz==2023.3.post1
pywin32==306
pywin32-ctypes==0.2.2
PyYAML==6.0.1
referencing==0.32.1
regex==2023.12.25
requests==2.31.0
requests-cache==1.1.1
requests-oauthlib==1.3.1
rpds-py==0.17.1
rsa==4.9
ruamel.yaml==0.18.5
ruamel.yaml.clib==0.2.8
six==1.16.0
smmap==5.0.1
sniffio==1.3.0
SQLAlchemy==2.0.25
strictyaml==1.7.3
tabulate==0.9.0
tenacity==8.2.3
tiktoken==0.5.2
tqdm==4.66.1
typing-inspect==0.9.0
typing_extensions==4.9.0
tzdata==2023.4
url-normalize==1.4.3
urllib3==2.1.0
waitress==2.1.2
Werkzeug==3.0.1
yarl==1.9.4
zipp==3.17.0
@msha1026, the tools will be marked as deprecated with the next package release, although I don't have an exact timeline on decommissioning them altogether. The next release should be out this week, and that'll include the fix for the pymongo dependency.
WRT the other dependencies, they're bundled with the azure
extra for promptflow-vectordb, if you install that extra, you should get all of those as transitive requirements.
Awesome thank you for the details @Adarsh-Ramanathan. And yes, you are right. The azure
extra on promptflow-vectordb
does install the transitive requirements. I will close out this issue then since the preview Index lookup tool works for us locally and in the studio.
Hi @msha1026 , When I tried the new preview Index I get the error like
promptflow._utils.tool_utils.DynamicListError: Unable to display list of items due to 'Error when calling function promptflow_vectordb.tool.common_index_lookup_utils.list_available_query_types: tool_ui_callback.<locals>.wrapped() missing 3 required positional arguments: 'subscription_id', 'resource_group_name', and 'workspace_name''. Please contact the tool author/support team for troubleshooting assistance.
I already have a .azureml/config.json with the below content.
{
"subscription_id": "xxxxx",
"resource_group": "rg-test",
"workspace_name": "demo123"
}
I tried giving like below
mlindex_content: azureml://subscriptions/xxx/resourcegroups/rg-test/providers/Microsoft.MachineLearningServices/demo123/jaymachinelearning/data/lime-yuca-33pkdyvm05/versions/1
query_type: Hybrid
top_k: 3
queries: ${embed_the_question.output}
Still the same error from UI but when I tried the debug
024-02-01 09:55:43 +0000 51435 execution.flow INFO Node modify_query_with_history completes.
2024-02-01 09:55:43 +0000 51435 execution.flow INFO Executing node embed_the_question. node run id: 8861a96f-9c35-463d-934e-c89a1eeae5b0_embed_the_question_0
2024-02-01 09:55:43 +0000 51435 execution.flow INFO Node embed_the_question completes.
2024-02-01 09:55:43 +0000 51435 execution.flow INFO Executing node search_question_from_indexed_docs. node run id: 8861a96f-9c35-463d-934e-c89a1eeae5b0_search_question_from_indexed_docs_0
2024-02-01 09:55:43 +0000 51435 execution ERROR Node search_question_from_indexed_docs in line 0 failed. Exception: Execution failure in 'search_question_from_indexed_docs': (AttributeError) 'str' object has no attribute 'get'.
Traceback (most recent call last):
File "/workspaces/demo/.venv/lib/python3.11/site-packages/promptflow/_core/flow_execution_context.py", line 194, in _invoke_tool_with_timer
return f(**kwargs)
^^^^^^^^^^^
File "/workspaces/demo/.venv/lib/python3.11/site-packages/promptflow/_core/tracer.py", line 220, in wrapped
output = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/demo/.venv/lib/python3.11/site-packages/promptflow_vectordb/tool/common_index_lookup.py", line 59, in search
index = MLIndex(mlindex_config=mlindex_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/demo/.venv/lib/python3.11/site-packages/azureml/rag/mlindex.py", line 111, in __init__
self.index_config = mlindex_config.get("index", {})
^^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get'
@jayendranarumugam Our flow.dag.yaml
looks different from yours. This is how ours looks:
mlindex_content: >
embeddings:
api_base: https://<azure-openai-name>.api.cognitive.microsoft.com/
api_type: azure
api_version: 2023-07-01-preview
batch_size: '1'
connection:
id: /subscriptions/<sub-id>/resourceGroups/<rg-name>/providers/Microsoft.MachineLearningServices/workspaces/<azureml-wksp-name>/connections/<azure_openai_connection_name>
connection_type: workspace_connection
deployment: text-embedding-ada-002
dimension: 1536
kind: open_ai
model: text-embedding-ada-002
schema_version: '2'
index:
api_version: 2023-07-01-preview
connection:
id: /subscriptions/<sub-id>/resourceGroups/<rg-name>/providers/Microsoft.MachineLearningServices/workspaces/<azureml-wksp-name>/connections/<azure-search-connection-name>
connection_type: workspace_connection
endpoint: https://<azure-ai-search-name>.search.windows.net
engine: azure-sdk
field_mapping:
content: <name_of_field_in_search_for_document_contents>
embedding: <name_of_embedding_field_in_search>
metadata: <name_of_field_in_search_for_document_id>
index: <index_name>
kind: acs
semantic_configuration_name: null
queries: ${modify_query_with_history.output} #This is the unembedded query. This node should handle embedding the query for you
query_type: Hybrid (vector + keyword)
top_k: 3
Hi @msha1026 , When I tried the new preview Index I get the error like
promptflow._utils.tool_utils.DynamicListError: Unable to display list of items due to 'Error when calling function promptflow_vectordb.tool.common_index_lookup_utils.list_available_query_types: tool_ui_callback.<locals>.wrapped() missing 3 required positional arguments: 'subscription_id', 'resource_group_name', and 'workspace_name''. Please contact the tool author/support team for troubleshooting assistance.
I already have a .azureml/config.json with the below content.
{ "subscription_id": "xxxxx", "resource_group": "rg-test", "workspace_name": "demo123" }
I tried giving like below
mlindex_content: azureml://subscriptions/xxx/resourcegroups/rg-test/providers/Microsoft.MachineLearningServices/demo123/jaymachinelearning/data/lime-yuca-33pkdyvm05/versions/1 query_type: Hybrid top_k: 3 queries: ${embed_the_question.output}
Still the same error from UI but when I tried the debug
024-02-01 09:55:43 +0000 51435 execution.flow INFO Node modify_query_with_history completes. 2024-02-01 09:55:43 +0000 51435 execution.flow INFO Executing node embed_the_question. node run id: 8861a96f-9c35-463d-934e-c89a1eeae5b0_embed_the_question_0 2024-02-01 09:55:43 +0000 51435 execution.flow INFO Node embed_the_question completes. 2024-02-01 09:55:43 +0000 51435 execution.flow INFO Executing node search_question_from_indexed_docs. node run id: 8861a96f-9c35-463d-934e-c89a1eeae5b0_search_question_from_indexed_docs_0 2024-02-01 09:55:43 +0000 51435 execution ERROR Node search_question_from_indexed_docs in line 0 failed. Exception: Execution failure in 'search_question_from_indexed_docs': (AttributeError) 'str' object has no attribute 'get'. Traceback (most recent call last): File "/workspaces/demo/.venv/lib/python3.11/site-packages/promptflow/_core/flow_execution_context.py", line 194, in _invoke_tool_with_timer return f(**kwargs) ^^^^^^^^^^^ File "/workspaces/demo/.venv/lib/python3.11/site-packages/promptflow/_core/tracer.py", line 220, in wrapped output = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/workspaces/demo/.venv/lib/python3.11/site-packages/promptflow_vectordb/tool/common_index_lookup.py", line 59, in search index = MLIndex(mlindex_config=mlindex_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspaces/demo/.venv/lib/python3.11/site-packages/azureml/rag/mlindex.py", line 111, in __init__ self.index_config = mlindex_config.get("index", {}) ^^^^^^^^^^^^^^^^^^ AttributeError: 'str' object has no attribute 'get'
Did you ever managed to get this working? I am getting the exact same error when using an existing Prompt Flow in my vscode environment. My YAML definition looks exactly like the one posted as the response, I'm not sure where else the problem might be.
@pgr-lopes, you'll need to configure a default subscription/resourcegroup/workspace in your shell before attempting to configure index lookup:
az login
az account set --subscription <subscription_id>
az configure --defaults group=<resource_group_name> workspace=<workspace_name>
@DaweiCai FYI - we should be fetching these values from the config.json
, instead of expecting users to configure the same information in two different ways.
@pgr-lopes, this issue is closed; can you please open a new one for us to track?
@Adarsh-Ramanathan thank you, that makes sense, that's what was confusing me since I was already specifying those variables in the config.json file, and OpenAI connections were working just fine.
I'll try that out using the command line shell and I'll open a new issue for tracking, thanks!
Describe the bug In AzureML prompt flow studio, using the VectorDB lookup tool and connecting to AI Search, I set the
top_k
input value to 1. However, after a run of the flow, the output of the VectorDB lookup node shows 50 documents returned instead. I have tried various values fortop_k
but always receive 50 documents back.How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug:
azure_ai_search_lookup
nodetop_k
) Expected behavior Onlytop_k
amount of documents are returned fromazure_ai_search_lookup
nodeScreenshots Small snippet of the array of documents returned
Running Information(please complete the following information):
Additional context N/A