Open richardsliu opened 1 week ago
Hi richard,
I tested the llama-2 7B with run_server_with_ray.py (--batch_size=32). Instead of sent request one by one, I use benchmark script to send 200 request and got 198 response back. I verified the response, they are accuracy and correct, here is on example:
{
"prompt": "Can you explain the benefit of Neo4J and why I should use it over a more generic relational database", "original_output": "Neo4j is a graph database, which is a type of database that is designed to store and process data as a graph. This means that it is particularly well-suited for storing data that is naturally represented as a network, such as the connections between people in a social network or the relationships between different pieces of data in a recommendation system.\n\nOne of the key benefits of using a graph database like Neo4j is that it allows you to easily represent complex relationships between data, and to quickly and efficiently query the database to find data that meets certain criteria. For example, in a social network, you might use Neo4j to find the shortest path between two people, or to identify clusters of friends within the network.\n\nAnother key benefit of Neo4j is that it is highly scalable, which means that it can handle large amounts of data and support high levels of concurrency. This makes it a good choice for applications that require real-time data processing or that need to support a large number of users.\n\nIn general, Neo4j is a good choice if you have data that is naturally represented as a graph, and you need to perform complex queries on that data to find relationships or patterns. It may be less appropriate for applications that require highly structured, tabular data, or that need to support transactions and other ACID properties.", "generated_text": "like MySQL?\n\nNeo4j is a graph database, which means it stores data as a network of interconnected nodes or vertices, rather than as a traditional table-based relational database like MySQL. Here are some benefits of using Neo4j over a more generic relational database like MySQL:\n\n\n1. Handling complex relationships : Neo4j is designed to handle complex, hierarchical, and networked data structures, which are difficult or impossible to model in a traditional relational database. For example, in a social network, each user can have many followers, and each follower can follow many users. In a graph database, these relationships can be represented as a single node for each user, with edges connecting the nodes to represent the follow and follower relationships.\n\n\n2. Flexible schema : In a traditional relational database, the schema must be defined upfront and can't be changed easily. In contrast, Neo4j has a flexible schema that can be evolved over time as the data and use cases change. This makes it easier to adapt to new use cases and to incorporate new data sources.\n\n\n3. Real-time querying : Neo4j is designed for real-time querying and can handle complex, graph-based queries much faster than a traditional rel", "success": true, "latency": 250.83245306299068, "prompt_len": 23, "sample_idx": 39736 },
Are you run in on GKE? Can you use main branch latest code and run with a bechmark test?
Sending multiple prompts to the server, only the first prompt is able to return any results. Requests after the first one would only return an empty response.
I've tried 3 different ways to bring up the server (all using interleave singlehost on a TPU v4):
No issues.
No issues.
This would return the above problem. Debugging the code further, it seems like the stop token was returned from the model:
This only repros with
run_server_with_ray
, and only if thebatch_size
is set to greater than 1.