run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.44k stars 5k forks source link

[Feature Request]: Enable completely offline operation #9343

Closed Pascal-So closed 5 months ago

Pascal-So commented 9 months ago

Feature Description

When using huggingface models and embeddings that have been downloaded in advance, it should be possible to run llama_index on a computer that is not connected to the internet at all, since the usage doesn't depend on openai or any other online services then.

Reason

Right now, even just this import line leads to a network timeout if the computer is not connected to the internet:

from llama_index.embeddings import LangchainEmbedding
(full error message here) ``` File "/app/server.py", line 8, in from llama_index.embeddings import LangchainEmbedding File "/usr/local/lib/python3.11/site-packages/llama_index/__init__.py", line 21, in from llama_index.indices import ( File "/usr/local/lib/python3.11/site-packages/llama_index/indices/__init__.py", line 4, in from llama_index.indices.composability.graph import ComposableGraph File "/usr/local/lib/python3.11/site-packages/llama_index/indices/composability/__init__.py", line 4, in from llama_index.indices.composability.graph import ComposableGraph File "/usr/local/lib/python3.11/site-packages/llama_index/indices/composability/graph.py", line 7, in from llama_index.indices.base import BaseIndex File "/usr/local/lib/python3.11/site-packages/llama_index/indices/base.py", line 6, in from llama_index.chat_engine.types import BaseChatEngine, ChatMode File "/usr/local/lib/python3.11/site-packages/llama_index/chat_engine/__init__.py", line 1, in from llama_index.chat_engine.condense_question import CondenseQuestionChatEngine File "/usr/local/lib/python3.11/site-packages/llama_index/chat_engine/condense_question.py", line 6, in from llama_index.chat_engine.types import ( File "/usr/local/lib/python3.11/site-packages/llama_index/chat_engine/types.py", line 11, in from llama_index.memory import BaseMemory File "/usr/local/lib/python3.11/site-packages/llama_index/memory/__init__.py", line 1, in from llama_index.memory.chat_memory_buffer import ChatMemoryBuffer File "/usr/local/lib/python3.11/site-packages/llama_index/memory/chat_memory_buffer.py", line 12, in class ChatMemoryBuffer(BaseMemory): File "/usr/local/lib/python3.11/site-packages/llama_index/memory/chat_memory_buffer.py", line 18, in ChatMemoryBuffer default_factory=cast(Callable[[], Any], GlobalsHelper().tokenizer), ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/llama_index/utils.py", line 55, in tokenizer enc = tiktoken.get_encoding("gpt2") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/tiktoken/registry.py", line 73, in get_encoding enc = Encoding(**constructor()) ^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/tiktoken_ext/openai_public.py", line 11, in gpt2 mergeable_ranks = data_gym_to_mergeable_bpe_ranks( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/tiktoken/load.py", line 82, in data_gym_to_mergeable_bpe_ranks vocab_bpe_contents = read_file_cached(vocab_bpe_file).decode() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/tiktoken/load.py", line 50, in read_file_cached contents = read_file(blobpath) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/tiktoken/load.py", line 24, in read_file resp = requests.get(blobpath) ^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/api.py", line 73, in get return request("get", url, params=params, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 507, in send raise ConnectTimeout(e, request=request) requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /gpt-2/encodings/main/vocab.bpe (Caused by ConnectTimeoutError(, 'Connection to openaipublic.blob.core.windows.net timed out. (connect timeout=None)')) ```

Since for this application I'm only using the embeddings and no chat, therefore I managed to "fix" this by rewriting the GlobalsHelper.tokenizer method to just return None, so that tiktoken.get_encoding("gpt2") is no longer called. I also had to use set_global_tokenizer(..) to prevent another network timeout later on.

It would be nice if this use case worked out of the box without me having to modify the library.

Value of Feature

Our use case is that we want to search across documents that shouldn't leave our internal network. While I don't believe that the llama_index library will leak any information on purpose, we believe it's best to add additional safety measures, just in case any of the libraries in the dependency tree suddenly hit a bug or receive a malicious update.

Being able to run completely offline is also a feature that would enhance user trust in the library, and as such I believe that this is a feature that you could then prominently advertise in the readme and on the website of the library, once the necessary changes have been made. From what I've seen, it also seems like the required changes shouldn't be too massive.

dosubot[bot] commented 9 months ago

🤖

Hi @Pascal-So,

I'm Dosu, an AI bot here to assist you with bugs, answer your questions, and help you get started with contributing to our projects. While we wait for a human maintainer to respond, I'm here to provide some immediate support. Let's get started, shall we?

Thank you for your feature request and the detailed explanation of your use case. I understand the importance of being able to run LlamaIndex completely offline for security reasons and to enhance user trust in the library.

The issue you're experiencing seems to be related to the tiktoken library, which is used by LlamaIndex. The tiktoken library is trying to download a file from the internet, which is causing a network timeout when the computer is not connected to the internet.

To enable completely offline operation, you would need to modify the LlamaIndex code to use local files instead of downloading them from the internet. This would involve downloading all the necessary files once and storing them locally, then modifying the code to use these local files.

Here's a simplified example of how you can modify the get_dataset_info function to use local files:

def get_dataset_info(
    local_dir_path: PATH_TYPE,
    remote_dir_path: PATH_TYPE,
    remote_source_dir_path: PATH_TYPE,
    dataset_class: str,
    refresh_cache: bool = False,
    library_path: str = "library.json",
    source_files_path: str = "source_files",
    disable_library_cache: bool = False,
) -> Dict:
    """Get dataset info."""
    if isinstance(local_dir_path, str):
        local_dir_path = Path(local_dir_path)

    local_library_path = f"{local_dir_path}/{library_path}"
    dataset_id = None
    source_files = []

    # Check cache first
    if not refresh_cache and os.path.exists(local_library_path):
        with open(local_library_path) as f:
            library = json.load(f)
        if dataset_class in library:
            dataset_id = library[dataset_class]["id"]
            source_files = library[dataset_class].get("source_files", [])

    # Fetch up-to-date library from local repo if dataset_id not found
    if dataset_id is None:
        with open(f"{local_dir_path}/{library_path}") as f:
            library_raw_content = f.read()
        library = json.loads(library_raw_content)
        if dataset_class not in library:
            raise ValueError("Loader class name not found in library")

        dataset_id = library[dataset_class]["id"]
        source_files = _get_source_files_list(
            str(local_dir_path), f"/{dataset_id}/{source_files_path}"
        )

        # create cache dir if needed
        local_library_dir = os.path.dirname(local_library_path)
        if not disable_library_cache:
            if not os.path.exists(local_library_dir):
                os.makedirs(local_library_dir)

            # Update cache
            with open(local_library_path, "w") as f:
                f.write(library_raw_content)

    if dataset_id is None:
        raise ValueError("Dataset class name not found in library")

    return {
        "dataset_id": dataset_id,
        "source_files": source_files,
    }

Please note that this is a simplified example and you might need to modify other parts of the code as well to make the entire LlamaIndex work offline.

I found a similar solved issue in the LlamaIndex repository: [Feature Request]: Local-model defaults for non-predictor sensitive experimentation. This issue was closed by a pull request: feat(UX): provide all-MiniLM-L6-v2 as local embed_model. This might provide some insights into how to implement offline operation in LlamaIndex.

I hope this helps! If you have any further questions or need more clarification, please don't hesitate to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

gardberg commented 9 months ago

I would also love this feature, and am open to working on it.

Should "online-mode" be the default, and an attempt to load a local model be done if the http request fails?

Or should offline mode be "opt-in" and able to be set via a parameter?

Pascal-So commented 9 months ago

Hi @gardberg I think when talking about online or offline-by-default it makes sense to look at what the user is doing.

In my opinion, a single import statement should never try to connect to the internet, I feel like users generally don't expect any side-effects from an import statement, so there I'm definitely a fan of offline-by-default. Note that in the case that I encountered, already the import statement failed in my offline environment.

For any of the subsequent code after the imports I don't have very strong opinions. I suspect that many users enjoy the automatic downloading features and I realize that I'm probably in the minority when trying to run llama_index air-gapped. When instantiating the ServiceContext or HuggingFaceLLM class, it makes sense for them to automatically download what they need. As long as there is always a method available to provide offline models, such as the cache_folder argument of HuggingFaceEmbeddings, I think that the interface doesn't really need to change.

The online or offline default would in this case be decided on a case-by-case basis for classes/functions where it's relevant. This would also allow us to fulfill this feature-request gradually, by always prioritizing the places that make it most difficult for the user to run the library offline. As long as the user can provide a simple, well-documented argument to the function/class in question, and get it to work that way, I think we're fine.

The downside of this approach is that the user can't trust the library to act completely offline without manually checking basically the full library source and making sure that they really understand in which case a local or online resource is used. I think that this is not such a big issue. Any user that really needs this trust will just run the system air-gapped anyway, and then they can just fix errors as they pop up.

In my opinion, online-by-default is fine for anything other than import statements. The deciding factor of this feature will be the documentation. A user that encountering a network error should quickly be able to figure out which additional argument they should add to what function, or maybe which environment variables they should configure. It might even be possible to catch network exceptions and automatically print a helpful message that points the user to the relevant documentation.

Sorry for the wall of text, I hope this makes sense, feel free to ask if anything is unclear!

ChrisDelClea commented 8 months ago

Also got an error when tryping to use llama_index behind firewalls. It obviously cannot load tiktoken.get_encoding("gpt2") from openaipublic.blob.core.windows.net. So an opt-in offline-mode` or a way to pre-set the tiktoken path and other deps. required would be great.

gardberg commented 8 months ago

Took a first look at this:

When importing via import llama_index we directly try to download the corpora/stopwords and tokenizers/punkt packages via nltk. See the GlobalsHelper class. The dl dir used is defined by the NLTK_DATA env variable, which is set to __file__/_static/nltk_data by default if not initially set.

Do we need these packages for running locally? If so, they will have to be placed in e.g. llama_index/_static/ntlk_data manually.

Like @ChrisDelClea mentioned, an attempt to download a tokenizer via tiktoken is also made here. The DL location also seems to be _static/tiktoken, and defined by TIKTOKEN_CACHE_DIR.

The first idea of how to resolve this would probably to set a global tokenizer manually, which should be hinted at when tiktoken dl fails. It should also be clear what needs to be specified in the ServiceContext to disable calls to e.g. OpenAI (llm=None, embed_model='local').

draft PR, which I aim to extend :)

logan-markewich commented 8 months ago

Hey @gardberg @ChrisDelClea @Pascal-So the latest versions of llama-index from pypi should be coming with nltk and tiktoken pre-downloaded.

Is this not working as intended if you install the newest version? https://github.com/run-llama/llama_index/blob/22a037294a9900e6bc6e4a56941ba709041e6394/.github/workflows/publish_release.yml#L32

gardberg commented 8 months ago

Hey @logan-markewich, just tried doing a fresh install of llama_index using pip install llama_index, and the packages are indeed pre-downloaded under _static, great!

I suspect I did not have them downloaded since I installed via the development guidelines, which supposedly skipped the caching.

I assume then we should consider the package to be installed via pip in general, and therefore consider the nltk and tiktoken packages cached, so that step will not have to be considered, which is good :)

martgra commented 1 week ago

Its true that Llama index comes with some cached files for nltk. However on import it seems like it still need some internet connection to validate cache? Am I missing something as this is the output:

>>> import llama_index.core
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /home/user/.cache/pypoetry/virtualenvs/data-
[nltk_data]     processing-NouaBZn--py3.11/lib/python3.11/site-
[nltk_data]     packages/llama_index/core/_static/nltk_cache...
[nltk_data]   Package punkt_tab is already up-to-date!
>>> 

We froze our version of llama to llama-index = "0.10.13.post1". If this is somehow fixed in a newer version we might consider updating.