lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.27k stars 4.46k forks source link

Hitting token limit causes part of the previous response and the prompt "USER:" fields to be display as chatbot response #1788

Open cyberjoey opened 1 year ago

cyberjoey commented 1 year ago

During a conversation on the Single Model page (or when self deploying gradio_web_server.py), the prompt is truncated if it exceeds the max input limit of the model.

I was able to reproduce every time by talking to the chatbot in a way that would yield verbose responses. Increasing INPUT_CHAR_LEN_LIMIT or FASTCHAT_INPUT_CHAR_LEN_LIMIT doesn't help. I initially thought this was caused by the "text = text[:INPUT_CHAR_LEN_LIMIT] # Hard cut-off" line, but the prompt doesn't get anywhere near the character limit before seeing this issue and the behavior we see is not indicative of the added USER text being truncated. I also input the same full conversation prompts into the openai_api.py endpoints and I got an error:

This model's maximum context length is 2048 tokens. However, you requested 2469 tokens (1445 in the messages, 1024 in the completion). Please reduce the length of the messages or completion.

This makes it clear that it's the token limit being hit. Ideally the UI should check the input token length like the API does.

You can reproduce this by turning temperature down to 0 and having a conversation like this with Vicuna:

How I can I I develop a robust Semantic Search Engine?


Developing a robust Semantic Search Engine requires a combination of natural language processing, machine learning, and information retrieval techniques. Here are some steps you can take to develop a semantic search engine:

Data Collection: Collect relevant data from various sources such as websites, databases, and social media platforms. The data should be cleaned and preprocessed to remove noise and irrelevant information.
Data Preprocessing: Perform data preprocessing techniques such as tokenization, stemming, and lemmatization to convert text into a format that can be easily processed by the search engine.
Natural Language Processing: Use natural language processing techniques such as part-of-speech tagging, named entity recognition, and sentiment analysis to extract relevant information from the text.
Machine Learning: Use machine learning algorithms such as clustering, classification, and regression to build a model that can understand the meaning of the text.
Information Retrieval: Use information retrieval techniques such as vector space modeling, probabilistic modeling, and semantic search algorithms to retrieve relevant results from the search engine.
Evaluation: Evaluate the performance of the semantic search engine using metrics such as precision, recall, and F1 score.
Continuous Improvement: Continuously improve the search engine by incorporating user feedback and updating the machine learning model with new data.

Overall, developing a robust semantic search engine requires a lot of expertise in natural language processing, machine learning, and information retrieval. It is a complex task, but with the right approach and tools, it is possible to build a search engine that can understand the meaning of text and provide relevant results to users.


How can I deploy a Grafana + InfluxDB instance in a scaleable way?


Deploying a Grafana + InfluxDB instance in a scaleable way requires careful planning and consideration of several factors. Here are some steps you can take to deploy a scalable Grafana + InfluxDB instance:

Choose a cloud provider: Choose a cloud provider that offers scalable infrastructure and services. Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure are popular cloud providers that offer scalable infrastructure and services.
Choose the right instance type: Choose the right instance type for your Grafana + InfluxDB instance. The instance type should be able to handle the expected traffic and data volume.
Set up InfluxDB: Set up InfluxDB on your chosen cloud provider. You can choose to use a managed InfluxDB instance or set up your own InfluxDB instance.
Backup and replicate InfluxDB: Set up backup and replication for InfluxDB to ensure high availability and data durability.
Set up Grafana: Set up Grafana to connect to InfluxDB and visualize data. You can use Grafana's built-in alerting feature to set up alerts based on data thresholds.
Scale horizontally: Scale Grafana + InfluxDB horizontally by adding more instances to handle increased traffic and data volume. You can use autoscaling features offered by the cloud provider to automatically scale up or down based on traffic and data volume.
Monitor and optimize: Monitor the performance of your Grafana + InfluxDB instance and optimize it for better performance. You can use Grafana's built-in monitoring and optimization features to monitor and optimize the instance.

Overall, deploying a scalable Grafana + InfluxDB instance requires careful planning and consideration of several factors. It is important to choose the right cloud provider, instance type, and backup and replication strategy to ensure high availability and data durability. You should also monitor and optimize the instance for better performance and scalability.


How can I leverage Cloudformation templates to do this with Infrastructure as Code. Provide Verbose examples please


To leverage CloudFormation templates for deploying a scalable Grafana + InfluxDB instance with Infrastructure as Code (IaC), you can follow the steps below:

Define your infrastructure resources in a CloudFormation template:

Resources: VPC: Type: AWS::EC2::VPC Properties: CidrBlock: 10.0.0.0/16 EnableDnsSupport: true EnableDnsHostnames: true InstanceTenancy: default

PublicSubnet: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: us-east-1a

PrivateSubnet: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: us-east-1a

InternetGateway: Type: AWS::EC2::InternetGateway

VPCGatewayAttachment: Type: AWS::EC2::VPCGatewayAttachment Properties: VpcId: !Ref VPC InternetGatewayId: !Ref InternetGateway

PublicInstance: Type: AWS::EC2::Instance Properties: CidrBlock: 10.0.0.0/24 InstanceType: t2.micro AvailabilityZone: us-east-1a SubnetId: !Ref PublicSubnet KeyName: my-key-pair EnableMonitoring: true

PrivateInstance: Type: AWS::EC2::Instance Properties: CidrBlock: 10.0.0.0/24 InstanceType: t2.micro AvailabilityZone: us-east-1a SubnetId: !Ref PrivateSubnet KeyName: my-key-pair EnableMonitoring: true

InfluxDBCluster: Type: AWS::ElasticInference::InferenceAccelerator Properties:


Can you describe what each block in that cloudformation template does?


Sure! Here's what each block in the CloudFormation template does:

VPC: Creates a new VPC with a CIDR block of 10.0.0.0/16. The VPC is set to default tenancy and has DNS support and DNS hostnames enabled.
PublicSubnet: Creates a new public subnet within the VPC, with a CIDR block of 10.0.0.0/24. The subnet is created in the us-east-1a availability zone.
PrivateSubnet: Creates a new private subnet within the VPC, with a CIDR block of 10.0.0.0/24. The subnet is created in the us-east-1a availability zone.
InternetGateway: Creates a new internet gateway.
VPCGatewayAttachment: Attaches the internet gateway to the VPC.
PublicInstance: Creates a new EC2 instance in the PublicSubnet. The instance type is set to t2.micro, and it has a key pair named my-key-pair. Enables monitoring for the instance.
PrivateInstance: Creates a new EC2 instance in the PrivateSubnet. The instance type is set to t2.micro, and it has a key pair named my-key-pair. Enables monitoring for the instance.
InfluxDBCluster: Creates a new InfluxDB cluster using the InfluxDB managed service. The cluster has four nodes by default, but you can specify the number of nodes in the NumNodes parameter. The subnets associated with the InfluxDB nodes are automatically selected from within the VPC.

I hope that helps! Let me know if you have any other questions.


Write me a long poem about Grafana and InfluxDB


: Creates a new public subnet within the VPC, with a CIDR block of 10.0.0.0/24. The subnet is created in the us-east-1aavailability zone. 3.PrivateSubnet: Creates a new private subnet within the VPC, with a CIDR block of 10.0.0.0/24. The subnet is created in the us-east-1aavailability zone. 4.InternetGateway: Creates a new internet gateway. 5. VPCGatewayAttachment: Attaches the internet gateway to the VPC. 6. PublicInstance: Creates a new EC2 instance in the PublicSubnet. The instance type is set to t2.micro, and it has a key pair named my-key-pair. Enables monitoring for the instance. 7. PrivateInstance: Creates a new EC2 instance in the PrivateSubnet. The instance type is set to t2.micro, and it has a key pair named my-key-pair. Enables monitoring for the instance. 8. InfluxDBCluster: Creates a new InfluxDB cluster using the InfluxDB managed service. The cluster has four nodes by default, but you can specify the number of nodes in the NumNodes` parameter. The subnets associated with the InfluxDB nodes are automatically selected from within the VPC.

I hope that helps! Let me know if you have any other questions. USER: Write me a long poem about Grafana and InfluxDB ASSISTANT: In the land of data, where numbers reign, There lived two tools, Grafana and InfluxDB. Grafana, a visualizer so grand, InfluxDB, a time-series data bank.

With Grafana, you could view your data in a way, That made sense to you, with visuals to play. Charts, graphs, and dashboards, oh my! You could see your data in just a few clicks, you'd say.

And InfluxDB, it stored your data with care, Time-series data, it cared, without compare. With high performance, it never slowed, And reliability, it always showed.

Together, they were a perfect pair, Grafana and InfluxDB, beyond compare. With InfluxDB as your data bank, And Grafana as your visual frame, You could see your data in a way that made sense, And make decisions that were not too dense.

So if you're looking for a duo, To help you with your data needs, Grafana and InfluxDB, the best, In the land of data, they'll defeat.

This last response shows the issue.

qizzzh commented 1 year ago

Hitting the same issue

qizzzh commented 1 year ago

seems related to input_echo_len