run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.63k stars 5.24k forks source link

[Documentation]: Entity Extraction should use CUDA if available or at least surface the option #8685

Closed nickknyc closed 9 months ago

nickknyc commented 1 year ago

Documentation Issue Description

During my exploration of metadata extractors, particularly the line:

EntityExtractor(prediction_threshold=0.5),

I initially discovered the necessity of installing span_marker. Post-installation, despite having a GPU, the extractor ran on CPU. This led me to the span_marker docs, where the recommendation to move the model to CUDA for faster inference is stated:

# It is recommended to explicitly move the model to CUDA for faster inference, if possible
model.cuda()

Somewhere, possibly in LlamaIndexBot, I found that the device parameter is where GPU/CPU preference should be set. CoPilot then provided a succinct check:

EntityExtractor(prediction_threshold=0.5, device="cuda" if cuda_available else "cpu"),

I propose an explicit section in both the setup procedure and documentation to ensure a straightforward setup of CUDA when available.

Documentation Link

https://docs.llamaindex.ai/en/stable/examples/metadata_extraction/MetadataExtractionSEC.html

dosubot[bot] commented 1 year ago

🤖

Hello @nickknyc,

I'm Dosu, a friendly bot here to assist you with LlamaIndex! I can help you solve bugs, answer questions, and guide you in becoming a contributor. While we wait for a human maintainer, feel free to ask me anything. I'm here to help!

It seems like you've got this under control, if you want help or have specific questions, let me know what I can do for you!

nickknyc commented 1 year ago

what is the best way to interact with you? also, can you review my code if i send it to you?

Sent from Surface Duo


From: dosubot[bot] @.> Sent: Saturday, November 4, 2023 5:50:08 PM To: run-llama/llama_index @.> Cc: Nick Katsivelos @.>; Mention @.> Subject: Re: [run-llama/llama_index] [Documentation]: Entity Extraction should use CUDA if available or at least surface the option (Issue #8685)

🤖

Hello @nickknychttps://github.com/nickknyc,

I'm Dosu, a friendly bot here to assist you with LlamaIndex! I can help you solve bugs, answer questions, and guide you in becoming a contributor. While we wait for a human maintainer, feel free to ask me anything. I'm here to help!

It seems like you've got this under control, if you want help or have specific questions, let me know what I can do for you!

— Reply to this email directly, view it on GitHubhttps://github.com/run-llama/llama_index/issues/8685#issuecomment-1793563187, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAMGI3G523EQTSU73X3W7H3YC22BBAVCNFSM6AAAAAA6527NM2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGU3DGMJYG4. You are receiving this because you were mentioned.Message ID: @.***>

dosubot[bot] commented 1 year ago

🤖

Sure, you can definitely submit your code for review. The LlamaIndex project encourages contributions from the community. You can make a pull request (PR) on the LlamaIndex Github page. Here are the steps you can follow:

  1. Fork the LlamaIndex repository.
  2. Clone the forked repository to your local machine.
  3. Create a new Python virtual environment using poetry.
  4. Install all dependencies needed for most local development with poetry install --with dev,docs.
  5. Make your changes in the code.
  6. Format and lint your changes with make format; make lint.
  7. If your changes are significant, create a unit test in the tests folder and run all unit tests with pytest tests or make test.
  8. If your changes involve entirely new features, consider adding an example Jupyter notebook to showcase this feature in the examples folder.
  9. Commit your changes and push them to your forked repository.
  10. Open a pull request against the main LlamaIndex repository.

Please refer to the project's CONTRIBUTING.md file for the most accurate and up-to-date information.

Regarding your suggestion about an explicit section in both the setup procedure and documentation for setting up CUDA when available, it's a great idea. However, I couldn't find any specific information on setting up CUDA for faster inference in the LlamaIndex data framework in the context provided. You can refer to the official documentation of LlamaIndex for more detailed information and guidance.

As for the similar solved issue, you might find the discussion on the issue [Feature Request]: Tools for tuning retrieval hyperparameters helpful. It was last updated on September 27, 2023 and was referenced by [Discussion/Tracking]: Declarative Global Config which was last updated on October 19, 2023.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

nickknyc commented 1 year ago

Please let me know if you see any issues with the following ingestion pipeline

import logging import os import sys

import pinecone import torch

from llama_index import (ServiceContext, SimpleDirectoryReader, StorageContext, VectorStoreIndex, set_global_service_context) from llama_index.embeddings import OpenAIEmbedding from llama_index.indices.prompt_helper import PromptHelper from llama_index.llms import OpenAI from llama_index.node_parser import SimpleNodeParser from llama_index.node_parser.extractors import (EntityExtractor, KeywordExtractor, MetadataExtractor, QuestionsAnsweredExtractor, SummaryExtractor, TitleExtractor) from llama_index.text_splitter import TokenTextSplitter from llama_index.vector_stores import PineconeVectorStore

Set up logging

logging.basicConfig(stream=sys.stdout, level=logging.INFO) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

setup system tests

test to see if cuda is available

cuda_available = torch.cuda.is_available()

These are user defined variables - they should be moved into .env or somehting

DOC_PATH = r"D:\ObsidianVault\test" # be sure to keep the r in front of the string to make it a raw string SPLITTER_CHUNK_SIZE = 1024 SPLITTER_CHUNK_OVERLAP = 20 NODE_CHUNK_SIZE = 100 NODE_CHUNK_OVERLAP = 20 LLM_MODEL_MODEL = "gpt-4" LLM_MAX_TOKENS = 2000 EMBED_MODEL_MODEL = "text-embedding-ada-002" CONTEXT_WINDOW = 512

Metadata variables

PRODUCT_NAME = "myBot" DATA_VERSION = "v0.0.1" USERNAME = "username" RELEASE_STATUS = "test" # test, dev, prod

set up OpenAI API

OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]

Set up Pinecone - assumes you have an index already created

PINECONE_INDEX_NAME = "s13" PINECONE_ENVIRONMENT = "gcp-starter" PIENCONE_API_KEY = os.getenv("PINECONE_API_KEY") pinecone.init(api_key=PIENCONE_API_KEY, environment=PINECONE_ENVIRONMENT) pinecone_index = pinecone.Index(PINECONE_INDEX_NAME)

Set up the vector store

vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

Set up the storage context

storage_context = StorageContext.from_defaults(vector_store=vector_store)

set the LLM

llm = OpenAI( temperature=0.1, model=LLM_MODEL_MODEL, max_tokens=LLM_MAX_TOKENS, api_key=OPENAI_API_KEY, )

Define metadata

metadata = { "product_name": PRODUCT_NAME, "data_version": DATA_VERSION, "username": USERNAME, "release_status": RELEASE_STATUS, }

Set up the text splitter

text_splitter = TokenTextSplitter( separator=" ", chunk_size=SPLITTER_CHUNK_SIZE, chunk_overlap=SPLITTER_CHUNK_OVERLAP, backup_separators=["\n"], )

Set up the metadata extractor

The MetadataExtractor class is used to extract metadata from TextNodes.

metadata_extractor = MetadataExtractor( extractors=[ TitleExtractor(nodes=5, llm=llm), QuestionsAnsweredExtractor(questions=3, llm=llm), EntityExtractor( prediction_threshold=0.5,

it will load the Entity Extractor on the gpu if it is available

        device="cuda" if cuda_available else "cpu",
        label_entities=True,
    ),
    SummaryExtractor(summaries=["prev", "self"], llm=llm),
    KeywordExtractor(keywords=10, llm=llm),
    # CustomExtractor()
],

)

Set up the node parser

node_parser = SimpleNodeParser.from_defaults( chunk_size=NODE_CHUNK_SIZE, chunk_overlap=NODE_CHUNK_OVERLAP, text_splitter=text_splitter, include_metadata=True, include_prev_next_rel=True, metadata_extractor=metadata_extractor, )

Set up the embed model

embed_model=OpenAIEmbedding( api_key=OPENAI_API_KEY, temperature=0.1, model=EMBED_MODEL_MODEL )

Set up the prompt helper

prompt_helper=PromptHelper( context_window=CONTEXT_WINDOW, )

set up ServiceContext

service_context = ServiceContext.from_defaults( llm=llm, embed_model=embed_model, node_parser=node_parser, prompt_helper=prompt_helper, )

set ServiceContext as global

set_global_service_context(service_context)

Loading the data

The SimpleDirectoryReader class is used to load a directory of documents.

documents = SimpleDirectoryReader(DOC_PATH).load_data()

Update the metadata for each document

for document in documents: document.metadata.update(metadata)

Construct nodes from the document text

my_nodes = node_parser.get_nodes_from_documents(documents)

Generate embeddings for each node

for node in my_nodes: node_embedding = embed_model.get_text_embedding( node.get_content(metadata_mode="all") ) node.embedding = node_embedding node.metadata.update(metadata)

storage_context.vector_store.add(my_nodes)

index = VectorStoreIndex.from_documents( documents, service_context=service_context, show_progress=True, storage_context=storage_context, )

From: dosubot[bot] @.> Sent: Sunday, November 5, 2023 10:32 AM To: run-llama/llama_index @.> Cc: Nick Katsivelos @.>; Mention @.> Subject: Re: [run-llama/llama_index] [Documentation]: Entity Extraction should use CUDA if available or at least surface the option (Issue #8685)

🤖

Sure, you can definitely submit your code for review. The LlamaIndex project encourages contributions from the community. You can make a pull request (PR) on the LlamaIndex Github page. Here are the steps you can follow:

  1. Fork the LlamaIndex repository.
  2. Clone the forked repository to your local machine.
  3. Create a new Python virtual environment using poetry.
  4. Install all dependencies needed for most local development with poetry install --with dev,docs.
  5. Make your changes in the code.
  6. Format and lint your changes with make format; make lint.
  7. If your changes are significant, create a unit test in the tests folder and run all unit tests with pytest tests or make test.
  8. If your changes involve entirely new features, consider adding an example Jupyter notebook to showcase this feature in the examples folder.
  9. Commit your changes and push them to your forked repository.
  10. Open a pull request against the main LlamaIndex repository.

Please refer to the project's CONTRIBUTING.md file for the most accurate and up-to-date information.

Regarding your suggestion about an explicit section in both the setup procedure and documentation for setting up CUDA when available, it's a great idea. However, I couldn't find any specific information on setting up CUDA for faster inference in the LlamaIndex data framework in the context provided. You can refer to the official documentation of LlamaIndex for more detailed information and guidance.

As for the similar solved issue, you might find the discussion on the issue [Feature Request]: Tools for tuning retrieval hyperparametershttps://github.com/run-llama/llama_index/issues/6531 helpful. It was last updated on September 27, 2023 and was referenced by [Discussion/Tracking]: Declarative Global Confighttps://github.com/run-llama/llama_index/issues/6883 which was last updated on October 19, 2023.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bothttps://github.com/dosu-bot.

— Reply to this email directly, view it on GitHubhttps://github.com/run-llama/llama_index/issues/8685#issuecomment-1793769353, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAMGI3FLAR7JUZPKRY4S573YC6WOZAVCNFSM6AAAAAA6527NM2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTG43DSMZVGM. You are receiving this because you were mentioned.Message ID: @.**@.>>

dosubot[bot] commented 1 year ago

🤖

Upon reviewing your ingestion pipeline code for the LlamaIndex data framework, it appears to be well-structured and follows good practices. However, there are a few points that could be improved:

  1. Environment Variables: You are using environment variables for sensitive data like OPENAI_API_KEY and PINECONE_API_KEY, which is a good practice. However, it would be better to handle the case where these variables are not set. You can add a check to ensure these variables are set and if not, raise an appropriate error.

  2. Error Handling: Your code currently lacks error handling. For instance, if the directory specified in DOC_PATH does not exist or is empty, or if there's an issue with the Pinecone index, the script might fail unexpectedly. It would be beneficial to add try/except blocks around these critical sections and provide meaningful error messages.

  3. Logging: While you have set up logging, it would be helpful to include more logging statements throughout your code. This can help you understand the flow and state of your program, especially if something goes wrong. It's generally a good idea to log any exceptions that are caught, as well as any critical actions or decisions made in your code.

  4. Code Comments: Your code could benefit from more comments explaining what each section of the code does. This can make it easier for others (or future you) to understand what's going on.

  5. Hardcoded Variables: There are several hardcoded variables in your script (like SPLITTER_CHUNK_SIZE, SPLITTER_CHUNK_OVERLAP, etc.). It might be a good idea to move these to a configuration file or environment variables, making them easier to modify without changing the script itself.

  6. Data Validation: Before processing the documents, you could add some validation to ensure the data is in the expected format. This could prevent potential errors during the execution.

Remember, these are just suggestions and the actual improvements might depend on the overall structure and usage of your project.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Disiok commented 1 year ago

Thanks for the suggestion! We should definitely make the documentation clearer on this.

nickknyc Would you like to make PR to update this directly? Should be super straightforward!

nickknyc commented 1 year ago

on my todo list

dosubot[bot] commented 9 months ago

Hi, @nickknyc,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, the issue raised emphasizes the need for clearer documentation on using CUDA for entity extraction. Dosu-bot provided guidance on contributing to the LlamaIndex project, suggested referencing similar solved issues for additional context, and provided feedback on improving the ingestion pipeline code structure and practices. Disiok expressed interest in updating the documentation and invited you to make a pull request for the update.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you!

nickknyc commented 9 months ago

close item please

Sent from Surface Duo


From: dosubot[bot] @.> Sent: Monday, February 5, 2024 11:05:06 AM To: run-llama/llama_index @.> Cc: Nick Katsivelos @.>; Mention @.> Subject: Re: [run-llama/llama_index] [Documentation]: Entity Extraction should use CUDA if available or at least surface the option (Issue #8685)

Hi, @nickknychttps://github.com/nickknyc,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, the issue raised emphasizes the need for clearer documentation on using CUDA for entity extraction. Dosu-bot provided guidance on contributing to the LlamaIndex project, suggested referencing similar solved issues for additional context, and provided feedback on improving the ingestion pipeline code structure and practices. Disiok expressed interest in updating the documentation and invited you to make a pull request for the update.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you!

— Reply to this email directly, view it on GitHubhttps://github.com/run-llama/llama_index/issues/8685#issuecomment-1927334616, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAMGI3EIQZJQOW4KBMTTZWLYSD7LFAVCNFSM6AAAAAA6527NM2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRXGMZTINRRGY. You are receiving this because you were mentioned.Message ID: @.***>