run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.88k stars 5.09k forks source link

[Question]: the time of async is same as sync #12407

Closed lambda7xx closed 4 months ago

lambda7xx commented 6 months ago

Question Validation

Question

according to the async doc, async can get 2x speed up. my code is below

from llama_index.core import SimpleDirectoryReader
import torch
# from transformers import BitsAndBytesConfig

from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core.prompts import PromptTemplate

from llama_index.core import Settings

from llama_index.core import VectorStoreIndex
from llama_index.core import  SimpleDirectoryReader
from llama_index.core.response.notebook_utils import display_response

import llama_index.core 
import time 
import os
import torch.cuda.nvtx as nvtx

from llama_index.core.query_pipeline import (
    QueryPipeline,
    InputComponent,
    ArgPackComponent,
)
from typing import Dict, Any, List, Optional
from llama_index.core.llama_pack import BaseLlamaPack
from llama_index.core.llms import LLM
from llama_index.llms.openai import OpenAI
from llama_index.core import Document, VectorStoreIndex
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.schema import NodeWithScore, TextNode
from llama_index.core.node_parser import SentenceSplitter

# reader = SimpleDirectoryReader(input_files=["pg_essay.txt"])
# documents = reader.load_data("../data/paul_graham/paul_graham_essay.txt")
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
llama_debug = LlamaDebugHandler(print_trace_on_end=True)

callback_manager = CallbackManager([llama_debug])

documents = SimpleDirectoryReader("../data/paul_graham/").load_data() #data/llmama2_paper.json ./data/survery/llm_survery_paper.json

llm = HuggingFaceLLM(
    model_name="mistralai/Mistral-7B-Instruct-v0.1",
    tokenizer_name="mistralai/Mistral-7B-Instruct-v0.1",
    query_wrapper_prompt=PromptTemplate("<s>[INST] {query_str} [/INST] </s>\n"),
    context_window=3900,
    max_new_tokens=256,
    model_kwargs={"torch_dtype": torch.bfloat16},
    # tokenizer_kwargs={},
    generate_kwargs={"temperature": 0.2, "top_k": 5, "top_p": 0.95},
    device_map="auto",
)

Settings.llm = llm
embeddings = "local:BAAI/bge-small-en-v1.5"

Settings.embed_model = embeddings

chunk_sizes = [128, 256, 512, 1024]

# chunk_sizes = [1024]
query_engines = {}
for chunk_size in chunk_sizes:
    splitter = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=0)
    nodes = splitter.get_nodes_from_documents(documents)
    vector_index = VectorStoreIndex(nodes,  callback_manager=callback_manager)
    query_engines[str(chunk_size)] = vector_index.as_query_engine(llm=llm)

# construct query pipeline
p = QueryPipeline(verbose=True)
module_dict = {
    **query_engines,
    "input": InputComponent(),
    "summarizer": TreeSummarize(),
    "join": ArgPackComponent(
        convert_fn=lambda x: NodeWithScore(node=TextNode(text=str(x)))
    ),
}
p.add_modules(module_dict)
# add links from input to query engine (id'ed by chunk_size)
for chunk_size in chunk_sizes:
    p.add_link("input", str(chunk_size))
    p.add_link(str(chunk_size), "join", dest_key=str(chunk_size))
p.add_link("join", "summarizer", dest_key="nodes")
p.add_link("input", "summarizer", dest_key="query_str")

import time

async def some_async_function():
    start_time = time.time()
    response = await p.arun(input="What did the author do during his time in YC?")
    print(str(response))
    end_time = time.time()
    print(f"async Time taken: {end_time - start_time}")

import asyncio
asyncio.run(some_async_function())

# start_time = time.time()
# response = await p.arun(input="What did the author do during his time in YC?")
# print(str(response))
# end_time = time.time()
# print(f"Time taken: {end_time - start_time}")

# compare with sync method

start_time = time.time()
response = p.run(input="What did the author do during his time in YC?")
print(str(response))
end_time = time.time()
print(f"Time taken: {end_time - start_time}")

my log is below


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:01<00:01,  1.33s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.05it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.01s/it]
**********
Trace: index_construction
**********
**********
Trace: index_construction
**********
**********
Trace: index_construction
**********
**********
Trace: index_construction
**********
/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:492: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.2` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:497: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.95` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:509: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `5` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
  warnings.warn(
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
> Running modules and inputs in parallel: 
Module key: input. Input: 
input: What did the author do during his time in YC?

> Running modules and inputs in parallel: 
Module key: 128. Input: 
input: What did the author do during his time in YC?

Module key: 256. Input: 
input: What did the author do during his time in YC?

Module key: 512. Input: 
input: What did the author do during his time in YC?

Module key: 1024. Input: 
input: What did the author do during his time in YC?

1 complete, len(prompt): 178, 
 prompt:<s>[INST] Context information is below.
---------------------
file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

I had not originally intended YC to be a full-time job. I was going to do three things: hack, write essays, and work on YC. As YC grew, and I grew more excited about it, it started to take up a lot more than a third of my attention. But for the first few years I was still able to work on other things.

file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

At the time I didn't understand what he meant, but gradually it dawned on me that he was saying I should quit. This seemed strange advice, because YC was doing great. But if there was one thing rarer than Rtm offering advice, it was Rtm being wrong. So this set me thinking. It was true that on my current trajectory, YC would be the last thing I did, because it was only taking up more of my attention.
---------------------
Given the context information and not prior knowledge, answer the query.
Query: What did the author do during his time in YC?
Answer:  [/INST] </s>

2 complete, the time of tokenizer:  0.001026153564453125
3 complete, the time of model.generate:  3.3410751819610596
4 complete, len(completion):  63  and type(completion):  <class 'str'> 
 the complete:  
The author, Paul Graham, originally intended Y Combinator (YC) to be a full-time job, but he also planned to hack, write essays, and work on YC. As YC grew and he became more excited about it, it started to take up more than a third of his attention. However, during the first few years, he was still able to work on other things. 

************finish complete function************

1 complete, len(prompt): 390, 
 prompt:<s>[INST] Context information is below.
---------------------
file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

It was true that on my current trajectory, YC would be the last thing I did, because it was only taking up more of my attention. It had already eaten Arc, and was in the process of eating essays too. Either YC was my life's work or I'd have to leave eventually. And it wasn't, so I would.

In the summer of 2012 my mother had a stroke, and the cause turned out to be a blood clot caused by colon cancer. The stroke destroyed her balance, and she was put in a nursing home, but she really wanted to get out of it and back to her house, and my sister and I were determined to help her do it. I used to fly up to Oregon to visit her regularly, and I had a lot of time to think on those flights. On one of them I realized I was ready to hand YC over to someone else.

I asked Jessica if she wanted to be president, but she didn't, so we decided we'd try to recruit Sam Altman.

file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

I was going to do three things: hack, write essays, and work on YC. As YC grew, and I grew more excited about it, it started to take up a lot more than a third of my attention. But for the first few years I was still able to work on other things.

In the summer of 2006, Robert and I started working on a new version of Arc. This one was reasonably fast, because it was compiled into Scheme. To test this new Arc, I wrote Hacker News in it. It was originally meant to be a news aggregator for startup founders and was called Startup News, but after a few months I got tired of reading about nothing but startups. Plus it wasn't startup founders we wanted to reach. It was future startup founders. So I changed the name to Hacker News and the topic to whatever engaged one's intellectual curiosity.

HN was no doubt good for YC, but it was also by far the biggest source of stress for me.
---------------------
Given the context information and not prior knowledge, answer the query.
Query: What did the author do during his time in YC?
Answer:  [/INST] </s>

2 complete, the time of tokenizer:  0.0015420913696289062
3 complete, the time of model.generate:  2.4877939224243164
4 complete, len(completion):  50  and type(completion):  <class 'str'> 
 the complete:  
The author, Paul Graham, during his time in YC, worked on it and it took up a lot of his attention. He was also involved in other activities such as hacking and writing essays. However, he eventually decided to hand YC over to someone else and focus on other things. 

************finish complete function************

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
1 complete, len(prompt): 794, 
 prompt:<s>[INST] Context information is below.
---------------------
file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

But if there was one thing rarer than Rtm offering advice, it was Rtm being wrong. So this set me thinking. It was true that on my current trajectory, YC would be the last thing I did, because it was only taking up more of my attention. It had already eaten Arc, and was in the process of eating essays too. Either YC was my life's work or I'd have to leave eventually. And it wasn't, so I would.

In the summer of 2012 my mother had a stroke, and the cause turned out to be a blood clot caused by colon cancer. The stroke destroyed her balance, and she was put in a nursing home, but she really wanted to get out of it and back to her house, and my sister and I were determined to help her do it. I used to fly up to Oregon to visit her regularly, and I had a lot of time to think on those flights. On one of them I realized I was ready to hand YC over to someone else.

I asked Jessica if she wanted to be president, but she didn't, so we decided we'd try to recruit Sam Altman. We talked to Robert and Trevor and we agreed to make it a complete changing of the guard. Up till that point YC had been controlled by the original LLC we four had started. But we wanted YC to last for a long time, and to do that it couldn't be controlled by the founders. So if Sam said yes, we'd let him reorganize YC. Robert and I would retire, and Jessica and Trevor would become ordinary partners.

When we asked Sam if he wanted to be president of YC, initially he said no. He wanted to start a startup to make nuclear reactors. But I kept at it, and in October 2013 he finally agreed. We decided he'd take over starting with the winter 2014 batch. For the rest of 2013 I left running YC more and more to Sam, partly so he could learn the job, and partly because I was focused on my mother, whose cancer had returned.

She died on January 15, 2014. We knew this was coming, but it was still hard when it did.

file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

The alumni became a tight community, dedicated to helping one another, and especially the current batch, whose shoes they remembered being in. We also noticed that the startups were becoming one another's customers. We used to refer jokingly to the "YC GDP," but as YC grows this becomes less and less of a joke. Now lots of startups get their initial set of customers almost entirely from among their batchmates.

I had not originally intended YC to be a full-time job. I was going to do three things: hack, write essays, and work on YC. As YC grew, and I grew more excited about it, it started to take up a lot more than a third of my attention. But for the first few years I was still able to work on other things.

In the summer of 2006, Robert and I started working on a new version of Arc. This one was reasonably fast, because it was compiled into Scheme. To test this new Arc, I wrote Hacker News in it. It was originally meant to be a news aggregator for startup founders and was called Startup News, but after a few months I got tired of reading about nothing but startups. Plus it wasn't startup founders we wanted to reach. It was future startup founders. So I changed the name to Hacker News and the topic to whatever engaged one's intellectual curiosity.

HN was no doubt good for YC, but it was also by far the biggest source of stress for me. If all I'd had to do was select and help founders, life would have been so easy. And that implies that HN was a mistake. Surely the biggest source of stress in one's work should at least be something close to the core of the work. Whereas I was like someone who was in pain while running a marathon not from the exertion of running, but because I had a blister from an ill-fitting shoe. When I was dealing with some urgent problem during YC, there was about a 60% chance it had to do with HN, and a 40% chance it had do with everything else combined. [17]

As well as HN, I wrote all of YC's internal software in Arc.
---------------------
Given the context information and not prior knowledge, answer the query.
Query: What did the author do during his time in YC?
Answer:  [/INST] </s>

2 complete, the time of tokenizer:  0.0027608871459960938
3 complete, the time of model.generate:  2.7717671394348145
4 complete, len(completion):  44  and type(completion):  <class 'str'> 
 the complete:  
The author, Paul Graham, founded Y Combinator (YC) and served as its president. He also wrote essays and worked on other projects, such as Arc and Hacker News. However, he eventually handed over YC to someone else and focused more on his mother's health. 

************finish complete function************

1 complete, len(prompt): 1577, 
 prompt:<s>[INST] Context information is below.
---------------------
file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

Robert and I would retire, and Jessica and Trevor would become ordinary partners.

When we asked Sam if he wanted to be president of YC, initially he said no. He wanted to start a startup to make nuclear reactors. But I kept at it, and in October 2013 he finally agreed. We decided he'd take over starting with the winter 2014 batch. For the rest of 2013 I left running YC more and more to Sam, partly so he could learn the job, and partly because I was focused on my mother, whose cancer had returned.

She died on January 15, 2014. We knew this was coming, but it was still hard when it did.

I kept working on YC till March, to help get that batch of startups through Demo Day, then I checked out pretty completely. (I still talk to alumni and to new startups working on things I'm interested in, but that only takes a few hours a week.)

What should I do next? Rtm's advice hadn't included anything about that. I wanted to do something completely different, so I decided I'd paint. I wanted to see how good I could get if I really focused on it. So the day after I stopped working on YC, I started painting. I was rusty and it took a while to get back into shape, but it was at least completely engaging. [18]

I spent most of the rest of 2014 painting. I'd never been able to work so uninterruptedly before, and I got to be better than I had been. Not good enough, but better. Then in November, right in the middle of a painting, I ran out of steam. Up till that point I'd always been curious to see how the painting I was working on would turn out, but suddenly finishing this one seemed like a chore. So I stopped working on it and cleaned my brushes and haven't painted since. So far anyway.

I realize that sounds rather wimpy. But attention is a zero sum game. If you can choose what to work on, and you choose a project that's not the best one (or at least a good one) for you, then it's getting in the way of another project that is. And at 50 there was some opportunity cost to screwing around.

I started writing essays again, and wrote a bunch of new ones over the next few months. I even wrote a couple that weren't about startups. Then in March 2015 I started working on Lisp again.

The distinctive thing about Lisp is that its core is a language defined by writing an interpreter in itself. It wasn't originally intended as a programming language in the ordinary sense. It was meant to be a formal model of computation, an alternative to the Turing machine. If you want to write an interpreter for a language in itself, what's the minimum set of predefined operators you need? The Lisp that John McCarthy invented, or more accurately discovered, is an answer to that question. [19]

McCarthy didn't realize this Lisp could even be used to program computers till his grad student Steve Russell suggested it. Russell translated McCarthy's interpreter into IBM 704 machine language, and from that point Lisp started also to be a programming language in the ordinary sense. But its origins as a model of computation gave it a power and elegance that other languages couldn't match. It was this that attracted me in college, though I didn't understand why at the time.

McCarthy's 1960 Lisp did nothing more than interpret Lisp expressions. It was missing a lot of things you'd want in a programming language. So these had to be added, and when they were, they weren't defined using McCarthy's original axiomatic approach. That wouldn't have been feasible at the time. McCarthy tested his interpreter by hand-simulating the execution of programs. But it was already getting close to the limit of interpreters you could test that way — indeed, there was a bug in it that McCarthy had overlooked. To test a more complicated interpreter, you'd have had to run it, and computers then weren't powerful enough.

Now they are, though. Now you could continue using McCarthy's axiomatic approach till you'd defined a complete programming language. And as long as every change you made to McCarthy's Lisp was a discoveredness-preserving transformation, you could, in principle, end up with a complete language that had this quality. Harder to do than to talk about, of course, but if it was possible in principle, why not try? So I decided to take a shot at it.

file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.

The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it. The result would ordinarily be to print something on the spectacularly loud printer.

I was puzzled by the 1401. I couldn't figure out what to do with it. And in retrospect there's not much I could have done with it. The only form of input to programs was data stored on punched cards, and I didn't have any data stored on punched cards. The only other option was to do things that didn't rely on any input, like calculate approximations of pi, but I didn't know enough math to do anything interesting of that type. So I'm not surprised I can't remember any programs I wrote, because they can't have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn't. On a machine without time-sharing, this was a social as well as a technical error, as the data center manager's expression made clear.

With microcomputers, everything changed. Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping. [1]

The first of my friends to get a microcomputer built it himself. It was sold as a kit by Heathkit. I remember vividly how impressed and envious I felt watching him sitting in front of it, typing programs right into the computer.

Computers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980. The gold standard then was the Apple II, but a TRS-80 was good enough. This was when I really started programming. I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book. There was only room in memory for about 2 pages of text, so he'd write 2 pages at a time and then print them out, but it was a lot better than a typewriter.

Though I liked programming, I didn't plan to study it in college. In college I was going to study philosophy, which sounded much more powerful. It seemed, to my naive high school self, to be the study of the ultimate truths, compared to which the things studied in other fields would be mere domain knowledge. What I discovered when I got to college was that the other fields took up so much of the space of ideas that there wasn't much left for these supposed ultimate truths. All that seemed left for philosophy were edge cases that people in other fields felt could safely be ignored.

I couldn't have put this into words when I was 18. All I knew at the time was that I kept taking philosophy courses and they kept being boring. So I decided to switch to AI.

AI was in the air in the mid 1980s, but there were two things especially that made me want to work on it: a novel by Heinlein called The Moon is a Harsh Mistress, which featured an intelligent computer called Mike, and a PBS documentary that showed Terry Winograd using SHRDLU. I haven't tried rereading The Moon is a Harsh Mistress, so I don't know how well it has aged, but when I read it I was drawn entirely into its world.
---------------------
Given the context information and not prior knowledge, answer the query.
Query: What did the author do during his time in YC?
Answer:  [/INST] </s>
 Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.

2 complete, the time of tokenizer:  0.005410909652709961
3 complete, the time of model.generate:  6.851969957351685
4 complete, len(completion):  101  and type(completion):  <class 'str'> 
 the complete:  
The author, Paul Graham, served as president of Y Combinator (YC) from October 2013 to March 2014. He took over the role from Sam, who had initially declined the position and wanted to start a startup to make nuclear reactors. During his time in YC, Graham focused on learning the job and helping to get the batch of startups through Demo Day. He also spent most of the rest of 2013 painting, and in November 2014, he ran out of steam and stopped working on his painting project. After that, he started writing essays again and working on Lisp programming language. 

************finish complete function************

> Running modules and inputs in parallel: 
Module key: join. Input: 
128: 
The author, Paul Graham, originally intended Y Combinator (YC) to be a full-time job, but he also planned to hack, write essays, and work on YC. As YC grew and he became more excited about it, it sta...
256: 
The author, Paul Graham, during his time in YC, worked on it and it took up a lot of his attention. He was also involved in other activities such as hacking and writing essays. However, he eventually...
512: 
The author, Paul Graham, founded Y Combinator (YC) and served as its president. He also wrote essays and worked on other projects, such as Arc and Hacker News. However, he eventually handed over YC t...
1024: 
The author, Paul Graham, served as president of Y Combinator (YC) from October 2013 to March 2014. He took over the role from Sam, who had initially declined the position and wanted to start a startu...

> Running modules and inputs in parallel: 
Module key: summarizer. Input: 
query_str: What did the author do during his time in YC?
nodes: [NodeWithScore(node=TextNode(id_='01b27f8f-bf1b-4b69-a08d-dbf22f93e582', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='\nThe auth...

1 complete, len(prompt): 295, 
 prompt:<s>[INST] Context information from multiple sources is below.
---------------------
The author, Paul Graham, originally intended Y Combinator (YC) to be a full-time job, but he also planned to hack, write essays, and work on YC. As YC grew and he became more excited about it, it started to take up more than a third of his attention. However, during the first few years, he was still able to work on other things.

The author, Paul Graham, during his time in YC, worked on it and it took up a lot of his attention. He was also involved in other activities such as hacking and writing essays. However, he eventually decided to hand YC over to someone else and focus on other things.

The author, Paul Graham, founded Y Combinator (YC) and served as its president. He also wrote essays and worked on other projects, such as Arc and Hacker News. However, he eventually handed over YC to someone else and focused more on his mother's health.

The author, Paul Graham, served as president of Y Combinator (YC) from October 2013 to March 2014. He took over the role from Sam, who had initially declined the position and wanted to start a startup to make nuclear reactors. During his time in YC, Graham focused on learning the job and helping to get the batch of startups through Demo Day. He also spent most of the rest of 2013 painting, and in November 2014, he ran out of steam and stopped working on his painting project. After that, he started writing essays again and working on Lisp programming language.
---------------------
Given the information from multiple sources and not prior knowledge, answer the query.
Query: What did the author do during his time in YC?
Answer:  [/INST] </s>

2 complete, the time of tokenizer:  0.0012831687927246094
3 complete, the time of model.generate:  2.416733741760254
4 complete, len(completion):  50  and type(completion):  <class 'str'> 
 the complete:  
The author, Paul Graham, during his time in YC, worked on it and it took up a lot of his attention. He was also involved in other activities such as hacking and writing essays. However, he eventually decided to hand YC over to someone else and focus on other things. 

************finish complete function************

The author, Paul Graham, during his time in YC, worked on it and it took up a lot of his attention. He was also involved in other activities such as hacking and writing essays. However, he eventually decided to hand YC over to someone else and focus on other things.
async Time taken: 17.954928159713745
> Running module input with input: 
input: What did the author do during his time in YC?

QueryPipeline::_run_multi, Time taken for input is 3.814697265625e-06
> Running module 128 with input: 
input: What did the author do during his time in YC?

1 complete, len(prompt): 178, 
 prompt:<s>[INST] Context information is below.
---------------------
file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

I had not originally intended YC to be a full-time job. I was going to do three things: hack, write essays, and work on YC. As YC grew, and I grew more excited about it, it started to take up a lot more than a third of my attention. But for the first few years I was still able to work on other things.

file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

At the time I didn't understand what he meant, but gradually it dawned on me that he was saying I should quit. This seemed strange advice, because YC was doing great. But if there was one thing rarer than Rtm offering advice, it was Rtm being wrong. So this set me thinking. It was true that on my current trajectory, YC would be the last thing I did, because it was only taking up more of my attention.
---------------------
Given the context information and not prior knowledge, answer the query.
Query: What did the author do during his time in YC?
Answer:  [/INST] </s>

2 complete, the time of tokenizer:  0.0005886554718017578
3 complete, the time of model.generate:  3.2246968746185303
4 complete, len(completion):  63  and type(completion):  <class 'str'> 
 the complete:  
The author, Paul Graham, originally intended Y Combinator (YC) to be a full-time job, but he also planned to hack, write essays, and work on YC. As YC grew and he became more excited about it, it started to take up more than a third of his attention. However, during the first few years, he was still able to work on other things. 

************finish complete function************

QueryPipeline::_run_multi, Time taken for 128 is 3.2436001300811768
> Running module 256 with input: 
input: What did the author do during his time in YC?

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
1 complete, len(prompt): 390, 
 prompt:<s>[INST] Context information is below.
---------------------
file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

It was true that on my current trajectory, YC would be the last thing I did, because it was only taking up more of my attention. It had already eaten Arc, and was in the process of eating essays too. Either YC was my life's work or I'd have to leave eventually. And it wasn't, so I would.

In the summer of 2012 my mother had a stroke, and the cause turned out to be a blood clot caused by colon cancer. The stroke destroyed her balance, and she was put in a nursing home, but she really wanted to get out of it and back to her house, and my sister and I were determined to help her do it. I used to fly up to Oregon to visit her regularly, and I had a lot of time to think on those flights. On one of them I realized I was ready to hand YC over to someone else.

I asked Jessica if she wanted to be president, but she didn't, so we decided we'd try to recruit Sam Altman.

file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

I was going to do three things: hack, write essays, and work on YC. As YC grew, and I grew more excited about it, it started to take up a lot more than a third of my attention. But for the first few years I was still able to work on other things.

In the summer of 2006, Robert and I started working on a new version of Arc. This one was reasonably fast, because it was compiled into Scheme. To test this new Arc, I wrote Hacker News in it. It was originally meant to be a news aggregator for startup founders and was called Startup News, but after a few months I got tired of reading about nothing but startups. Plus it wasn't startup founders we wanted to reach. It was future startup founders. So I changed the name to Hacker News and the topic to whatever engaged one's intellectual curiosity.

HN was no doubt good for YC, but it was also by far the biggest source of stress for me.
---------------------
Given the context information and not prior knowledge, answer the query.
Query: What did the author do during his time in YC?
Answer:  [/INST] </s>

2 complete, the time of tokenizer:  0.0008769035339355469
3 complete, the time of model.generate:  2.486938953399658
4 complete, len(completion):  50  and type(completion):  <class 'str'> 
 the complete:  
The author, Paul Graham, during his time in YC, worked on it and it took up a lot of his attention. He was also involved in other activities such as hacking and writing essays. However, he eventually decided to hand YC over to someone else and focus on other things. 

************finish complete function************

QueryPipeline::_run_multi, Time taken for 256 is 2.5043890476226807
> Running module 512 with input: 
input: What did the author do during his time in YC?

1 complete, len(prompt): 794, 
 prompt:<s>[INST] Context information is below.
---------------------
file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

But if there was one thing rarer than Rtm offering advice, it was Rtm being wrong. So this set me thinking. It was true that on my current trajectory, YC would be the last thing I did, because it was only taking up more of my attention. It had already eaten Arc, and was in the process of eating essays too. Either YC was my life's work or I'd have to leave eventually. And it wasn't, so I would.

In the summer of 2012 my mother had a stroke, and the cause turned out to be a blood clot caused by colon cancer. The stroke destroyed her balance, and she was put in a nursing home, but she really wanted to get out of it and back to her house, and my sister and I were determined to help her do it. I used to fly up to Oregon to visit her regularly, and I had a lot of time to think on those flights. On one of them I realized I was ready to hand YC over to someone else.

I asked Jessica if she wanted to be president, but she didn't, so we decided we'd try to recruit Sam Altman. We talked to Robert and Trevor and we agreed to make it a complete changing of the guard. Up till that point YC had been controlled by the original LLC we four had started. But we wanted YC to last for a long time, and to do that it couldn't be controlled by the founders. So if Sam said yes, we'd let him reorganize YC. Robert and I would retire, and Jessica and Trevor would become ordinary partners.

When we asked Sam if he wanted to be president of YC, initially he said no. He wanted to start a startup to make nuclear reactors. But I kept at it, and in October 2013 he finally agreed. We decided he'd take over starting with the winter 2014 batch. For the rest of 2013 I left running YC more and more to Sam, partly so he could learn the job, and partly because I was focused on my mother, whose cancer had returned.

She died on January 15, 2014. We knew this was coming, but it was still hard when it did.

file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

The alumni became a tight community, dedicated to helping one another, and especially the current batch, whose shoes they remembered being in. We also noticed that the startups were becoming one another's customers. We used to refer jokingly to the "YC GDP," but as YC grows this becomes less and less of a joke. Now lots of startups get their initial set of customers almost entirely from among their batchmates.

I had not originally intended YC to be a full-time job. I was going to do three things: hack, write essays, and work on YC. As YC grew, and I grew more excited about it, it started to take up a lot more than a third of my attention. But for the first few years I was still able to work on other things.

In the summer of 2006, Robert and I started working on a new version of Arc. This one was reasonably fast, because it was compiled into Scheme. To test this new Arc, I wrote Hacker News in it. It was originally meant to be a news aggregator for startup founders and was called Startup News, but after a few months I got tired of reading about nothing but startups. Plus it wasn't startup founders we wanted to reach. It was future startup founders. So I changed the name to Hacker News and the topic to whatever engaged one's intellectual curiosity.

HN was no doubt good for YC, but it was also by far the biggest source of stress for me. If all I'd had to do was select and help founders, life would have been so easy. And that implies that HN was a mistake. Surely the biggest source of stress in one's work should at least be something close to the core of the work. Whereas I was like someone who was in pain while running a marathon not from the exertion of running, but because I had a blister from an ill-fitting shoe. When I was dealing with some urgent problem during YC, there was about a 60% chance it had to do with HN, and a 40% chance it had do with everything else combined. [17]

As well as HN, I wrote all of YC's internal software in Arc.
---------------------
Given the context information and not prior knowledge, answer the query.
Query: What did the author do during his time in YC?
Answer:  [/INST] </s>

2 complete, the time of tokenizer:  0.0014390945434570312
3 complete, the time of model.generate:  2.770507574081421
4 complete, len(completion):  44  and type(completion):  <class 'str'> 
 the complete:  
The author, Paul Graham, founded Y Combinator (YC) and served as its president. He also wrote essays and worked on other projects, such as Arc and Hacker News. However, he eventually handed over YC to someone else and focused more on his mother's health. 

************finish complete function************

QueryPipeline::_run_multi, Time taken for 512 is 2.788543939590454
> Running module 1024 with input: 
input: What did the author do during his time in YC?

1 complete, len(prompt): 1577, 
 prompt:<s>[INST] Context information is below.
---------------------
file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

Robert and I would retire, and Jessica and Trevor would become ordinary partners.

When we asked Sam if he wanted to be president of YC, initially he said no. He wanted to start a startup to make nuclear reactors. But I kept at it, and in October 2013 he finally agreed. We decided he'd take over starting with the winter 2014 batch. For the rest of 2013 I left running YC more and more to Sam, partly so he could learn the job, and partly because I was focused on my mother, whose cancer had returned.

She died on January 15, 2014. We knew this was coming, but it was still hard when it did.

I kept working on YC till March, to help get that batch of startups through Demo Day, then I checked out pretty completely. (I still talk to alumni and to new startups working on things I'm interested in, but that only takes a few hours a week.)

What should I do next? Rtm's advice hadn't included anything about that. I wanted to do something completely different, so I decided I'd paint. I wanted to see how good I could get if I really focused on it. So the day after I stopped working on YC, I started painting. I was rusty and it took a while to get back into shape, but it was at least completely engaging. [18]

I spent most of the rest of 2014 painting. I'd never been able to work so uninterruptedly before, and I got to be better than I had been. Not good enough, but better. Then in November, right in the middle of a painting, I ran out of steam. Up till that point I'd always been curious to see how the painting I was working on would turn out, but suddenly finishing this one seemed like a chore. So I stopped working on it and cleaned my brushes and haven't painted since. So far anyway.

I realize that sounds rather wimpy. But attention is a zero sum game. If you can choose what to work on, and you choose a project that's not the best one (or at least a good one) for you, then it's getting in the way of another project that is. And at 50 there was some opportunity cost to screwing around.

I started writing essays again, and wrote a bunch of new ones over the next few months. I even wrote a couple that weren't about startups. Then in March 2015 I started working on Lisp again.

The distinctive thing about Lisp is that its core is a language defined by writing an interpreter in itself. It wasn't originally intended as a programming language in the ordinary sense. It was meant to be a formal model of computation, an alternative to the Turing machine. If you want to write an interpreter for a language in itself, what's the minimum set of predefined operators you need? The Lisp that John McCarthy invented, or more accurately discovered, is an answer to that question. [19]

McCarthy didn't realize this Lisp could even be used to program computers till his grad student Steve Russell suggested it. Russell translated McCarthy's interpreter into IBM 704 machine language, and from that point Lisp started also to be a programming language in the ordinary sense. But its origins as a model of computation gave it a power and elegance that other languages couldn't match. It was this that attracted me in college, though I didn't understand why at the time.

McCarthy's 1960 Lisp did nothing more than interpret Lisp expressions. It was missing a lot of things you'd want in a programming language. So these had to be added, and when they were, they weren't defined using McCarthy's original axiomatic approach. That wouldn't have been feasible at the time. McCarthy tested his interpreter by hand-simulating the execution of programs. But it was already getting close to the limit of interpreters you could test that way — indeed, there was a bug in it that McCarthy had overlooked. To test a more complicated interpreter, you'd have had to run it, and computers then weren't powerful enough.

Now they are, though. Now you could continue using McCarthy's axiomatic approach till you'd defined a complete programming language. And as long as every change you made to McCarthy's Lisp was a discoveredness-preserving transformation, you could, in principle, end up with a complete language that had this quality. Harder to do than to talk about, of course, but if it was possible in principle, why not try? So I decided to take a shot at it.

file_path: /home/ubuntu/uw-llama/query_rewrite/../data/paul_graham/paul_graham_essay.txt

What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.

The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it. The result would ordinarily be to print something on the spectacularly loud printer.

I was puzzled by the 1401. I couldn't figure out what to do with it. And in retrospect there's not much I could have done with it. The only form of input to programs was data stored on punched cards, and I didn't have any data stored on punched cards. The only other option was to do things that didn't rely on any input, like calculate approximations of pi, but I didn't know enough math to do anything interesting of that type. So I'm not surprised I can't remember any programs I wrote, because they can't have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn't. On a machine without time-sharing, this was a social as well as a technical error, as the data center manager's expression made clear.

With microcomputers, everything changed. Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping. [1]

The first of my friends to get a microcomputer built it himself. It was sold as a kit by Heathkit. I remember vividly how impressed and envious I felt watching him sitting in front of it, typing programs right into the computer.

Computers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980. The gold standard then was the Apple II, but a TRS-80 was good enough. This was when I really started programming. I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book. There was only room in memory for about 2 pages of text, so he'd write 2 pages at a time and then print them out, but it was a lot better than a typewriter.

Though I liked programming, I didn't plan to study it in college. In college I was going to study philosophy, which sounded much more powerful. It seemed, to my naive high school self, to be the study of the ultimate truths, compared to which the things studied in other fields would be mere domain knowledge. What I discovered when I got to college was that the other fields took up so much of the space of ideas that there wasn't much left for these supposed ultimate truths. All that seemed left for philosophy were edge cases that people in other fields felt could safely be ignored.

I couldn't have put this into words when I was 18. All I knew at the time was that I kept taking philosophy courses and they kept being boring. So I decided to switch to AI.

AI was in the air in the mid 1980s, but there were two things especially that made me want to work on it: a novel by Heinlein called The Moon is a Harsh Mistress, which featured an intelligent computer called Mike, and a PBS documentary that showed Terry Winograd using SHRDLU. I haven't tried rereading The Moon is a Harsh Mistress, so I don't know how well it has aged, but when I read it I was drawn entirely into its world.
---------------------
Given the context information and not prior knowledge, answer the query.
Query: What did the author do during his time in YC?
Answer:  [/INST] </s>
 Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.

2 complete, the time of tokenizer:  0.0026307106018066406
3 complete, the time of model.generate:  6.851429462432861
4 complete, len(completion):  101  and type(completion):  <class 'str'> 
 the complete:  
The author, Paul Graham, served as president of Y Combinator (YC) from October 2013 to March 2014. He took over the role from Sam, who had initially declined the position and wanted to start a startup to make nuclear reactors. During his time in YC, Graham focused on learning the job and helping to get the batch of startups through Demo Day. He also spent most of the rest of 2013 painting, and in November 2014, he ran out of steam and stopped working on his painting project. After that, he started writing essays again and working on Lisp programming language. 

************finish complete function************

QueryPipeline::_run_multi, Time taken for 1024 is 6.872500419616699
> Running module join with input: 
128: 
The author, Paul Graham, originally intended Y Combinator (YC) to be a full-time job, but he also planned to hack, write essays, and work on YC. As YC grew and he became more excited about it, it sta...
256: 
The author, Paul Graham, during his time in YC, worked on it and it took up a lot of his attention. He was also involved in other activities such as hacking and writing essays. However, he eventually...
512: 
The author, Paul Graham, founded Y Combinator (YC) and served as its president. He also wrote essays and worked on other projects, such as Arc and Hacker News. However, he eventually handed over YC t...
1024: 
The author, Paul Graham, served as president of Y Combinator (YC) from October 2013 to March 2014. He took over the role from Sam, who had initially declined the position and wanted to start a startu...

QueryPipeline::_run_multi, Time taken for join is 0.0001316070556640625
> Running module summarizer with input: 
query_str: What did the author do during his time in YC?
nodes: [NodeWithScore(node=TextNode(id_='4b3597c5-88c3-40db-a0ab-9d9f53de89e8', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='\nThe auth...

1 complete, len(prompt): 295, 
 prompt:<s>[INST] Context information from multiple sources is below.
---------------------
The author, Paul Graham, originally intended Y Combinator (YC) to be a full-time job, but he also planned to hack, write essays, and work on YC. As YC grew and he became more excited about it, it started to take up more than a third of his attention. However, during the first few years, he was still able to work on other things.

The author, Paul Graham, during his time in YC, worked on it and it took up a lot of his attention. He was also involved in other activities such as hacking and writing essays. However, he eventually decided to hand YC over to someone else and focus on other things.

The author, Paul Graham, founded Y Combinator (YC) and served as its president. He also wrote essays and worked on other projects, such as Arc and Hacker News. However, he eventually handed over YC to someone else and focused more on his mother's health.

The author, Paul Graham, served as president of Y Combinator (YC) from October 2013 to March 2014. He took over the role from Sam, who had initially declined the position and wanted to start a startup to make nuclear reactors. During his time in YC, Graham focused on learning the job and helping to get the batch of startups through Demo Day. He also spent most of the rest of 2013 painting, and in November 2014, he ran out of steam and stopped working on his painting project. After that, he started writing essays again and working on Lisp programming language.
---------------------
Given the information from multiple sources and not prior knowledge, answer the query.
Query: What did the author do during his time in YC?
Answer:  [/INST] </s>

2 complete, the time of tokenizer:  0.0006818771362304688
3 complete, the time of model.generate:  2.4160642623901367
4 complete, len(completion):  50  and type(completion):  <class 'str'> 
 the complete:  
The author, Paul Graham, during his time in YC, worked on it and it took up a lot of his attention. He was also involved in other activities such as hacking and writing essays. However, he eventually decided to hand YC over to someone else and focus on other things. 

************finish complete function************

QueryPipeline::_run_multi, Time taken for summarizer is 2.418837547302246
end QueryPipeline::_run_multi*******

The author, Paul Graham, during his time in YC, worked on it and it took up a lot of his attention. He was also involved in other activities such as hacking and writing essays. However, he eventually decided to hand YC over to someone else and focus on other things.
Time taken: 17.828704118728638

the time taken of async and sync are same.

dosubot[bot] commented 6 months ago

Hey @lambda7xx! Great to see you here again. 😊 I'm diving into your issue and will get back to you with a thorough answer shortly! 🚀

lambda7xx commented 6 months ago

and the from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler does not work according to the doc https://docs.llamaindex.ai/en/stable/examples/callbacks/LlamaDebugHandler/

dosubot[bot] commented 6 months ago

The async method might not be faster than the sync method due to the nature of the tasks being performed. Async methods are beneficial when there are IO-bound tasks, such as network requests or file operations, where the program has to wait for the operation to complete. However, if the tasks are CPU-bound, like in most data processing tasks, async methods might not provide a speedup and can sometimes even be slower due to the overhead of context switching. In your case, it seems like the tasks are CPU-bound, which might be why you're not seeing a speedup with the async method.

Regarding the warnings related to do_sample, temperature, and top_p settings, these are due to the way the HuggingFaceLLM class is being instantiated. To resolve these warnings, you should set do_sample=True in the generate_kwargs dictionary when instantiating the HuggingFaceLLM class.

For the import issue with CallbackManager and LlamaDebugHandler, it might be due to a version mismatch. The classes might not be available in the version of the LlamaIndex library you're using. Check the version of the library and update it if necessary. If the issue persists, it might be a bug in the library, and reporting it to the maintainers would be the next step.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
lambda7xx commented 6 months ago

I use nsys to profile the llama-index, it seems the retrieve is one the cpu side and the llm call is gpu side. is there other things that are cpu side.

logan-markewich commented 6 months ago

@lambda7xx when running a model locally like you are, there is no advantage to async, since it is all compute bound. Async only makes sense for

a) Running LLMs over an API, so that responses can be properly awaited b) running multiple requests at once (async on its own does not speed things up, but allows you to run things concurrently)

lambda7xx commented 6 months ago

@lambda7xx when running a model locally like you are, there is no advantage to async, since it is all compute bound. Async only makes sense for

a) Running LLMs over an API, so that responses can be properly awaited b) running multiple requests at once (async on its own does not speed things up, but allows you to run things concurrently)

@logan-markewich hi, if I want to run multiple requests at once to test its throughput, how to modify my code? i want to deploy the llamaindex as a webserver(https://github.com/run-llama/llama_index/issues/12396) and run multiple requests at once

VJlaxmi commented 6 months ago

Hi @lambda7xx

The similar timings between async and sync methods are expected in CPU-bound tasks like yours, where async doesn't inherently speed up the process but allows for concurrent IO operations. Your profiling is correct: retrieval is CPU-bound and LLM calls are GPU-bound, which explains the observed performance. For CPU-bound tasks, consider parallel processing to enhance efficiency.

logan-markewich commented 6 months ago

I would also consider deploying your models as dedicated servers (TEI, TGI. vLLM, etc)