chain.arun() for summarization no faster than chain() #8391

Closed pseudotensor closed 11 months ago

pseudotensor commented 11 months ago

System Info

(h2ogpt) jon@pseudotensor:~/h2ogpt$ pip freeze | grep langchain langchain==0.0.235 langchainplus-sdk==0.0.20

Python 3.10

(h2ogpt) jon@pseudotensor:~/h2ogpt$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04 Codename: focal

Who can help?



Related Components


The above shows my introduction of async from before not having it.

The text generation inference server is set to have a large concurrency, but is showing requests are coming in back-to-back.

Expected behavior

I expect the summarization part to be parallel, like stated here:

But perhaps I misunderstand something. Or perhaps it's not really parallel:

There's lots of discussion w.r.t. hitting rate limit with OpenAI:

So I presume this works, but I'm not seeing it. In OpenAI case it seems to be done via batching, which is possible in HF TGI server but not implemented. But I would have thought that all the reduction tasks could have been in parallel with asyncio.

I understand that you're experiencing an issue where the chain.arun() method for summarization in LangChain is not running faster than the standard chain() method. You're expecting the chain.arun() method to run the summarization process in parallel, as indicated in the code comments, but you're not observing this behavior.

The chain.arun() method is indeed designed to run the summarization process in parallel. However, the speed of the operation can be influenced by several factors. These include the number of documents being processed, the complexity of the documents, the specific language model being used, and the system resources available.

If you're not seeing a speed improvement when using chain.arun(), it could be due to one or more of these factors. For instance, if the number of documents is small, the overhead of setting up the parallel processing might outweigh the benefits. Alternatively, if your system resources, such as CPU or memory, are already heavily utilized, parallel processing might not provide a speedup.

In the code, self.llm_chain.aapply() is the method that applies the language model to each document in parallel. If this isn't providing the expected speedup, it might be worth investigating the factors I mentioned earlier.

Please note that this is just one potential explanation for the behavior you're observing. There could be other factors at play that are specific to your setup or use case.

If you find this explanation accurate and it's not covered in the existing documentation, I encourage you to make a pull request to update the documentation. This could be beneficial for other users who might encounter the same issue.


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

pseudotensor commented 11 months ago

Here's verbose output:

> Finished chain.

> Entering new LLMChain chain...
Prompt after formatting:
In order to write a concise single-paragraph or bulleted list summary, pay attention to the following text:
 Sure! Here's a condensed and concise summary of the key results based on the provided text:

• Robust Speech Recognition via Large-Scale Weak Supervision:

▪ Training speech processing systems on large amounts of unlabeled audio data (680,000 hours) leads to models that generalize well to standard benchmarks and are competitive with fully supervised results.

▪ The models approach human accuracy and robustness when compared to human transcriptions.

▪ The approach does not require fine-tuning, making it more practical and accessible for real-world applications.

▪ The models are robust and generalize well to held-out datasets, outperforming models trained on a single source.

▪ The use of large-scale weak supervision can help overcome the limitations of small-scale supervised datasets and improve the robustness of speech recognition systems.

 Sure! Here is a condensed and concise summary of the key results based on the provided text:

• Robust speech recognition performance under natural noise environments:

▪ Whisper models outperform zero-shot performance under low noise levels (40 dB SNR) but degrade quickly under more intensive noise (<10 dB SNR).
▪ Whisper models demonstrate robustness to natural distribution shifts, such as pub noise.

• Long-form transcription capabilities:

▪ Whisper models can perform buffered transcription of long audio by consecutively transcribing 30-second segments and shifting the window based on timestamps predicted by the model.
▪ Beam search and temperature scheduling based on repetitiveness and log probability of model predictions are crucial for reliable long-form transcription.
▪ Whisper competes with state-of-the-art commercial and open-source ASR systems in long-form transcription, with the best performance on all datasets and in most cases.

 Sure! Here is a condensed and concise summary of the key results based on the provided text:

• Robust Speech Recognition via Large-Scale Weak Supervision:
▪ Developed a simple data augmentation method called SpecAugment that improves the robustness of automatic speech recognition (ASR) models to unseen speakers and environments.
▪ Demonstrated the effectiveness of SpecAugment on several benchmark datasets, achieving state-of-the-art performance in robust ASR.

• On the Difficulty of Training Recurrent Neural Networks:
▪ Analyzed the difficulty of training recurrent neural networks (RNNs) and proposed a new measure of training difficulty based on the energy of the loss landscape.
▪ Showed that RNNs are more difficult to train than other neural network architectures, and that this difficulty can be mitigated by using better initialization methods and regularization techniques.

• Pytorch: An Imperative Style, High-Performance Deep Learning Library:
▪ Introduced PyTorch, an imperative-style deep learning library that provides a dynamic computation graph and automatic differentiation.
▪ Demonstrated the performance and flexibility of PyTorch on several benchmark tasks, including image classification and natural language processing.

• Scikit-learn: Machine Learning in Python:
▪ Presented scikit-learn, an open-source machine learning library for Python that provides a wide range of algorithms for classification, regression, clustering, and other tasks.
▪ Discussed the design principles and implementation of scikit-learn, and demonstrated its use on several benchmark datasets.

• Acceleration of Stochastic Approximation by Averaging:
▪ Proposed a method for accelerating stochastic approximation using averaging, which can improve the convergence rate of optimization algorithms.
▪ Analyzed the convergence properties of the proposed method and demonstrated its effectiveness on several benchmark problems.

• Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters:
▪ Introduced a massively multilingual speech recognition system that uses a single model to recognize 50 languages, with over 1 billion parameters.
▪ Demonstrated the effectiveness of the proposed system on several benchmark datasets, achieving state-of-the-art performance in multilingual ASR.

• Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer:
▪ Investigated the limits of transfer learning with a unified text-to-text transformer, which can perform a variety of natural language processing tasks without task-specific fine-tuning.
▪ Demonstrated the effectiveness of the proposed approach on several benchmark datasets, achieving state-of-the-art performance in a variety of tasks.
Using only the text above, write a condensed and concise summary of key results (preferably as bullet points):

> Finished chain.

> Finished chain.

Actual final answer:

Sure! Here is a condensed and concise summary of the key results based on the provided text:

• Robust Speech Recognition via Large-Scale Weak Supervision:

▪ Trained speech processing systems on large amounts of unlabeled audio data, leading to models that generalize well to standard benchmarks and are competitive with fully supervised results.

▪ Models approach human accuracy and robustness when compared to human transcriptions.

▪ No fine-tuning required, making the approach more practical and accessible for real-world applications.

▪ Models generalize well to held-out datasets, outperforming models trained on a single source.

• Robust speech recognition performance under natural noise environments:

▪ Whisper models outperform zero-shot performance under low noise levels but degrade quickly under more intensive noise.

▪ Whisper models demonstrate robustness to natural distribution shifts, such as pub noise.

• Long-form transcription capabilities:

▪ Whisper models can perform buffered transcription of long audio by consecutively transcribing 30-second segments and shifting the window based on timestamps predicted by the model.

▪ Beam search and temperature scheduling are crucial for reliable long-form transcription.

▪ Whisper competes with state-of-the-art commercial and open-source ASR systems in long-form transcription.

• Other key results:

▪ Developed a simple data augmentation method called SpecAugment that improves the robustness of ASR models to unseen speakers and environments.

▪ Analyzed the difficulty of training RNNs and proposed a new measure of training difficulty based on the energy of the loss landscape.

▪ Introduced PyTorch, an imperative-style deep learning library that provides a dynamic computation graph and automatic differentiation.

▪ Presented scikit-learn, an open-source machine learning library for Python that provides a wide range of algorithms for classification, regression, clustering, and other tasks.

▪ Proposed a method for accelerating stochastic approximation using averaging, which can improve the convergence rate of optimization algorithms.

▪ Introduced a massively multilingual speech recognition system that uses a single model to recognize 50 languages, with over 1 billion parameters.

▪ Investigated the limits of transfer learning with a unified text-to-text transformer, which can perform a variety of natural language processing tasks without task-specific fine-tuning.
pseudotensor commented 11 months ago

I confirmed in pycharm that when reaching acombine_docs() that len(docs)==3. So should be work to parallelize.

pseudotensor commented 11 months ago

If I put debug into when entering and calling _acall() I see:

> Finished chain.

> Entering new LLMChain chain...
Prompt after formatting:
In order to write a concise single-paragraph or bulleted list summary, pay attention to the following text:

Please note that the original text contains several references to external sources, which I've removed for brevity. Also, please keep in mind that the text may contain some technical jargon or specific terms related to the field of speech recognition, which I'll try my best to simplify or exclude.

* Whisper achieves robust speech recognition performance even under adverse noise conditions, outperforming other models in certain scenarios.
* Whisper models are trained on 30-second audio chunks and can be extended to long-form transcription using a buffered transcription approach.
* The long-form transcription performance of Whisper is competitive with state-of-the-art commercial and open-source ASR systems, with a distribution of word error rates ranging from 4% to 17%.
* Whisper outperforms the best open-source model (NVIDIA STT) on all datasets and in most cases, commercial ASR systems as well.

• Robust speech recognition via large-scale weak supervision
• Simple data augmentation method for automatic speech recognition
• On the difficulty of training recurrent neural networks
• High-performance deep learning library for imperative style
• Machine learning in Python for various tasks
• Acceleration of stochastic approximation by averaging
• Massively multilingual ASR with 50 languages and 1 billion parameters
• Large-scale multilingual dataset for speech research
• Unsupervised multitask learners using language models
• Transferable visual models from natural language supervision
• Exploring the limits of transfer learning with a unified text-to-text transformer
• General-purpose speech toolkit for various tasks
• Do ImageNet classifiers generalize to ImageNet?
• Imagenet large scale visual recognition challenge
• Feature engineering in context-dependent deep neural networks for conversational speech transcription
• Neural machine translation of rare words with subword units
• Simple way to prevent neural networks from overfitting
• Sequence to sequence learning with neural networks
• Measuring robustness to natural distribution shifts in image classification.
Using only the text above, write a condensed and concise summary of key results (preferably as bullet points):

> Finished chain.

> Finished chain.

i.e. this block is a problem:

It's not doing these in parallel. I have no callbacks, just the default stdout callback. So I don't know what is wrong.

pseudotensor commented 11 months ago

I don't see how this can work:

        for prompt in prompts:
            print("prompt", flush=True)
            text = (
                await self._acall(prompt, stop=stop, run_manager=run_manager, **kwargs)
                if new_arg_supported
                else await self._acall(prompt, stop=stop, **kwargs)
        return LLMResult(generations=generations)

This will await in a simple loop, each await blocking the next call.

Needs to be like:


tasks = [self.generate_url(url) for url in urls]
    await asyncio.wait(tasks)

and probably need to control degree of concurrency like:


sem = asyncio.Semaphore(3)

async def safe_download(i):
    async with sem:  # semaphore limits num of simultaneous downloads
        return await download(i)

async def main():
    tasks = [
        asyncio.ensure_future(safe_download(i))  # creating task starts coroutine
        for i
        in range(9)
    await asyncio.gather(*tasks)  # await moment all downloads done
pseudotensor commented 11 months ago

@agola11 Seems you make the related commits:

pseudotensor commented 11 months ago

After fixing, now get what I expect:

Enter acall
begin gen_text
Enter acall
begin gen_text
Enter acall
begin gen_text
end gen_text
end acall
end gen_text
end acall
end gen_text
end acall
pseudotensor commented 11 months ago

One can see fixes here in h2oGPT:

