vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
25.48k stars 3.69k forks source link

[Bug]: `truncate_prompt_tokens` in SamplingParams only available for openai entrypoints, not for offline vLLM engine #4507

Open YuWang916 opened 3 months ago

YuWang916 commented 3 months ago

Your current environment

Collecting environment information... PyTorch version: 2.2.1+cu118 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A

OS: CBL-Mariner/Linux (x86_64) GCC version: (GCC) 11.2.0 Clang version: Could not collect CMake version: version 3.21.4 Libc version: glibc-2.35

Python version: 3.10.2 (main, Feb 22 2024, 00:00:03) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.138.1-4.cm2-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB GPU 1: NVIDIA A100-SXM4-80GB GPU 2: NVIDIA A100-SXM4-80GB GPU 3: NVIDIA A100-SXM4-80GB

Nvidia driver version: 525.85.12 cuDNN version: Probably one of the following: /usr/lib/libcudnn.so.8.9.5 /usr/lib/libcudnn_adv_infer.so.8.9.5 /usr/lib/libcudnn_adv_train.so.8.9.5 /usr/lib/libcudnn_cnn_infer.so.8.9.5 /usr/lib/libcudnn_cnn_train.so.8.9.5 /usr/lib/libcudnn_ops_infer.so.8.9.5 /usr/lib/libcudnn_ops_train.so.8.9.5 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 256 On-line CPU(s) list: 0-255 Vendor ID: AuthenticAMD Model name: AMD EPYC 7763 64-Core Processor CPU family: 25 Model: 1 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 2 Stepping: 1 Frequency boost: enabled CPU max MHz: 3529.0520 CPU min MHz: 1500.0000 BogoMIPS: 4899.80 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca Virtualization: AMD-V L1d cache: 4 MiB (128 instances) L1i cache: 4 MiB (128 instances) L2 cache: 64 MiB (128 instances) L3 cache: 512 MiB (16 instances) NUMA node(s): 8 NUMA node0 CPU(s): 0-15,128-143 NUMA node1 CPU(s): 16-31,144-159 NUMA node2 CPU(s): 32-47,160-175 NUMA node3 CPU(s): 48-63,176-191 NUMA node4 CPU(s): 64-79,192-207 NUMA node5 CPU(s): 80-95,208-223 NUMA node6 CPU(s): 96-111,224-239 NUMA node7 CPU(s): 112-127,240-255 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Mitigation; safe RET, no microcode Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] flake8==4.0.1.1 [pip3] flake8-annotations-complexity==0.0.6.2 [pip3] flake8-bugbear==20.1.4 [pip3] flake8-builtins==1.4.2 [pip3] flake8-pie==0.5.0.1 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.24.3 [pip3] nvidia-nccl-cu11==2.19.3 [pip3] pytorch-lightning==2.2.3 [pip3] torch==2.2.1+cu118 [pip3] torch-lib==0.1.25 [pip3] torchmetrics==1.3.1 [pip3] triton==2.2.0 [pip3] vllm-nccl-cu11==2.18.1.0.4.0 [conda] Could not collectROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.4.1 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU0 X NV12 NV12 NV12 48-63,176-191 3 GPU1 NV12 X NV12 NV12 48-63,176-191 3 GPU2 NV12 NV12 X NV12 16-31,144-159 1 GPU3 NV12 NV12 NV12 X 80-95,208-223 5

Legend:

X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks

๐Ÿ› Describe the bug

Below is the code running on 1 node with 4 A100 GPUs:

from vllm import LLM, SamplingParams

# A long prompt that needs truncation (token length larger than model limit)
prompt = '''The California scrub jay (Aphelocoma californica) is a species of scrub jay native to western North America. It ranges from southern British Columbia throughout California and western Nevada near Reno to west of the Sierra Nevada. The California scrub jay was once lumped with Woodhouse's scrub jay and collectively called the western scrub jay. The group was also lumped with the island scrub jay and the Florida scrub jay; the taxon was then called simply scrub jay.[2] The California scrub jay is nonmigratory and can be found in urban areas, where it can become tame and will come to bird feeders. While many refer to scrub jays as "blue jays", the blue jay is a different species of bird entirely. Etymology The generic name, Aphelocoma, derives from Latinized Ancient Greek apheles- (from แผ€ฯ†ฮตฮปฮฎฯ‚-) "simple" + Latin coma (from Greek kome ฮบฯŒฮผฮท) "hair", in reference to the lack of striped or banded feathers in this genus, compared to other jays. The species name, californica, is Latin for "from California". Description The California scrub jay is a medium-sized bird, approximately 27โ€“31 cm (11โ€“12 in) in length (including its tail), with a 39 cm (15 in) wingspan, and about 80 g (2.8 oz) in weight. In general, this species has a blue head, wings, and tail; a gray-brown back; grayish underparts; and white eyebrows. The throat is whitish with a blue necklace. The call or "screech" is described as "harsh and scratchy". Behavior This section needs additional citations for verification. Please help improve this article by adding citations to reliable sources in this section. Unsourced material may be challenged and removed. (June 2015) (Learn how and when to remove this message) Habitat True to its name, the California scrub jay inhabits areas of low scrub, preferring pinon-juniper forests, oak woods, and edges of mixed evergreen forests. It also inhabits suburban gardens. Foraging California scrub jays usually forage in pairs, family groups, or small non-kin groups, outside of the breeding season. They feed on small animals, such as frogs and lizards, eggs and young of other birds, insects, and (particularly in winter) grains, nuts, and berries. They will also eat fruit and vegetables growing in backyards. Food storing California scrub jays, like many other corvids, exploit ephemeral surpluses by storing food in scattered caches within their territories. They rely on highly accurate and complex memories to recover the hidden caches, often after long periods of time.[3] In the process of collecting and storing this food, they have shown an ability to plan ahead in choosing cache sites to provide adequate food volume and variety for the future.[4] Western scrub jays are also able to rely on their accurate observational spatial memories to steal food from caches made by conspecifics. Food-storing birds implement a number of strategies to protect their caches from potential 'pilferers.'[5][6] Anecdotally, scrub jays โ€“ and corvids more generally โ€“ are known for an attraction to, and thievery of, brightly colored objects. Recent research debunks, or at least casts doubt, on this idea.[7][8][9] Corvids do, however, have a mischievous streak, and scrub jays are not above outright theft. They have been observed stealing acorns from acorn woodpecker caches. Some scrub jays snatch acorns from the hiding places of other jays. When these birds go to hide their own acorns, they check first that no other jays are watching. Other protection methods include moving the cache in the presence of an observer, storing inedible decoys like small stones instead of food, and hiding the cache once a scavenging bird is no longer watching;[10] these behaviors are thought to vary based on the presence or absence of potential pilferers (like other corvids) as well as what kind of animal might pilfer the cache, implying strategic and socially complex motives behind different kinds of caching behavior. [10][11] Intelligence Main article: Bird intelligence Recent research has suggested that western scrub jays, along with several other corvids, are among the most intelligent of animals. The brain-to-body mass ratio of adult scrub jays rivals that of chimpanzees and cetaceans, and is dwarfed only by that of humans. Scrub jays are also the only non-primate or non-dolphin shown to plan ahead for the future (known as metacognition), which was previously thought of as a uniquely human trait.[12] Other studies have shown that they can remember locations of over 200 food caches, as well as the food item in each cache and its rate of decay.[13] To protect their caches from pilfering conspecifics, scrub jays will choose locations out of sight of their competitors, or re-cache caches once they are alone, suggesting that they can take into account the perspective of others.[5] Jays are able to mimic raptors like red-tailed and red-shouldered hawks with such accuracy that is can be difficult to distinguish between species using calls alone; possible explanations for this behavior include warning other jays about the presence of a predator or trying to deter birds (like cache-pilfering corvids) from a given area.[14] However, jays have been observed employing raptor-mimicking calls without the presence of other birds, making the precise adaptive reason for this behavior unknown, though it may be two-fold.[15] California scrub jays also summon others to screech over the body of a dead jay, according to research from the University of California, Davis. The birds' cacophonous "funerals" can last for up to half an hour.[16][17] Nesting Juvenile in California, USA Nests are built low in trees or bushes, 1โ€“10 m (3.3โ€“32.8 ft) above the ground, primarily by the female, while the male guards her efforts. The nests are sturdy, with an outside diameter of 33โ€“58 cm (13โ€“23 in), constructed on a platform of twigs with moss and dry grasses lined with fine roots and hair. Four to six eggs are laid from March through July, with some regional variations. There are two common shell color variations: pale green with irregular, olive-colored spots or markings; and pale grayish-white to green with reddish-brown spots. The female incubates the eggs for about 16 days. The young leave the nest about 18 days after hatching. Life span The life span of wild California scrub jays is approximately 9 years. The oldest known western scrub jay was found in Castaic, California, in 1991 and raised in captivity. "Aaron" lived to be 19 years, and 8 months old. Diseases Populations are being adversely affected by the West Nile virus, particularly in California's Central Valley.
Phylogeny California scrub jay showing the well-marked breast band of the coastal races Note bright white plume breaking the breast band. Prominent markings in eye region are typical of male birds. California scrub jay fledgling being fed California scrub jay in flight Woodhouse's, California, Island, and Florida scrub jay were once considered subspecies of a single "scrub jay" species. They are now believed to be distinct.[2][18][19] Beyond the close relationship of the "California" and island scrub jays, resolution of their evolutionary history has proven very difficult. Woodhouse's scrub jay differ in plumage (paler blue above, with an indistinct and usually incomplete breast band) from California scrub jay which is darker blue above with a strongly defined โ€“ but not necessarily complete โ€“ blue breast band. The following subspecies are recognized:[2] Aphelocoma californica immanis Grinnell, 1901 โ€“ Interior scrub jay From Puget Sound through the Willamette Valley to Douglas County, Oregon A large subspecies. Somewhat duller and lighter in color than californica due to gene flow from inland populations. Blue of head and neck less purplish than in woodhouseii group. Back usually quite brownish, underside and especially breast quite whitish, undertail coverts usually tinged pale blue or gray in males. Bill strong, wings and tail fairly short. Aphelocoma californica caurina Pitelka, 1951 Coastal SW Oregon from Rogue River valley south to Napa and Sonoma Counties; eastern limit the inner California Coast Ranges. Similar to californica, but head and back more intensely colored, with bright purplish tinge to blue of head. Color similar to nominate, thus darker than immanis and most oocleptica. Relative to nominate californica, blue areas more purplish and brighter, breast darker than rest of underside. Aphelocoma californica oocleptica Swarth, 1918 โ€“ Nicasio scrub jay (includes A. c. superciliosa) From Jackson, Klamath, and Lake Counties, Oregon, through Sacramento and San Joaquin Valleys and surrounding mountains to Kern County, San Francisco Bay area, and Alpine County. Eastwards to Inyo County and Virginia Mountains (Washoe County, Nevada), where it intergrades with nevadae of the woodhouseii group. Quite variable according to the extent of gene flow between this taxon and nevadae. Generally similar to californica but larger; color of head and neck varies in lightness and amount of purplish hue. Back grayish; undertail coverts usually white. Bill usually heavy but variable according to habitat type (less heavy in birds of pinyon woodland). Aphelocoma californica californica (Vigors, 1839) โ€“ California scrub jay California Coast Ranges from San Mateo County and SE Alameda County to SW Ventura County. Blue of head usually strongly tinged purple. Back bluish-brownish gray, bluer towards the rump. Incomplete bluish-violet breast band. Underside greyish white, darker on the breast. Undertail coverts white tinged with blue. Thighs gray. Rectrices and remiges dark blue, the larger feathers duller. Bill heavy, tip strongly hooked. Aphelocoma californica obscura Anthony, 1889 โ€“ Belding's scrub jay Coastal SW California, east to Little San Bernardino Mountains, some isolated mountain ranges in W Mojave Desert, and Whale Peak (San Diego County). Southwards through N Baja California, Mexico (Sierra de Juรกrez, Sierra San Pedro Mรกrtir) to Todos Santos Bay Smaller and darker than californica, with more intense purplish and brown coloration on head and back, respectively; prominent gray streaking on throat and distinct breast collar. Belly with smoky gray wash, lighter in the middle. Generally more intense coloration overall. Bill heavy. Aphelocoma californica cana Pitelka, 1951 โ€“ Eagle Mountain scrub jay Only occurs in single-leaf pinyon woods on Eagle Mountain, Joshua Tree National Park. Smaller, lighter and grayer than californica. Bill not as heavy. Apparently an isolate of hybrid origin between A. c. obscura and nevadae of the woodhouseii group.
References BirdLife International (2017). "Aphelocoma californica". IUCN Red List of Threatened Species. 2017: e.T103727785A112293863. doi:10.2305/IUCN.UK.2017-1.RLTS.T103727785A112293863.en. Retrieved 12 November 2021. Curry, Robert L.; Peterson, A. Townsend & Langen, T.A. (2002): Western Scrub Jay (Aphelocoma californica). In: Poole, A. & Gill, F. (eds.): The Birds of North America 712. Academy of Natural Sciences, Philadelphia, PA & American Ornithologists' Union, Washington, D.C. Online version, retrieved 25 February 2007. doi:10.2173/bna.712 Clayton, N. S.; Bussey, T. J. & Dickinson, A. (2003). "Mental Time Travel: Can animals recall the past and plan for the future?" (PDF). Nature Reviews. Neuroscience. 4 (8): 685โ€“91. doi:10.1038/nrn1180. PMID 12894243. S2CID 11064341. Raby, C. R.; D. M. Alexis; A. Dickinson; N. S. Clayton (22 February 2007). "Planning for the future by western scrub-jays". Nature. 445 (7130): 919โ€“921. Bibcode:2007Natur.445..919R. doi:10.1038/nature05575. PMID 17314979. S2CID 4405897. Clayton, N. S.; Dally, J. M. & Emery, N. J. (2007). "Social cognition by food-caching corvids. The western scrub-jay as a natural psychologist". Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 362 (1480): 507โ€“22. doi:10.1098/rstb.2006.1992. PMC 2346514. PMID 17309867. Dally, J. M.; Emery, N. J. & Clayton, N. S. (2006). "Food-caching western scrub-jays keep track of who was watching when". Science. 312 (5780): 1662โ€“5. Bibcode:2006Sci...312.1662D. doi:10.1126/science.1126539. PMID 16709747. S2CID 21976318. Shephard, T. V.; Lea, S. E. G.; Hempel De Ibarra, N. (2015). "'The thieving magpie'? No evidence for attraction to shiny objects". Animal Cognition. 18 (1): 393โ€“397. doi:10.1007/s10071-014-0794-4. hdl:10871/16723. PMID 25123853. S2CID 717341. Do crows collect shiny objects? https://www.birds.cornell.edu/crows/crowfaq.htm#shiny Kevin J. McGowan, Cornell Lab of Ornithology. Retrieved 2021-08-01. Crow curiosities: Do crows collect shiny objects? Kaeli Swift. CorvidResearch.blog. 4 December 2015. Retrieved on 2021-08-01.'''

sampling_params = SamplingParams(
    max_tokens=10,
    skip_special_tokens=True,
    truncate_prompt_tokens=100,
    use_beam_search=False,
    temperature=0,
)

model_path = "/shared/public/models/falcon-40b-instruct"

llm_engine = LLM(model=model_path,
    max_model_len=2048,
    trust_remote_code=False, 
    tensor_parallel_size=4, 
    tokenizer=model_path,
)

# Generate text
outputs = llm_engine.generate(prompts, sampling_params)

Output:

Token indices sequence length is longer than the specified maximum sequence length for this model (3270 > 2048). Running this sequence through the model will result in indexing errors
Processed prompts:   0%|                                                                                                                                                                                                                | 0/1 [00:00<?, ?it/s]WARNING 04-30 21:39:58 scheduler.py:619] Input prompt (3270 tokens) is too long and exceeds limit of 2048
Processed prompts: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1/1 [00:00<00:00, 3336.76it/s]

I also searched the vLLM codebase, truncate_prompt_tokens seems to be not existent in the vLLM engine code.

Is there a workaround for this? Do I have to truncate manually before running the vLLM engine?

Thanks in advance!

simon-mo commented 3 months ago

This is a good issue to work on!

tdoublep commented 3 months ago

@YuWang916 I had actually implemented support in the engine for this at some point, but wasn't sure if there was any interest in this feature beyond the support in the OpenAI entrypoint. I can try to dig out those commits - probably the code on main has changed a bit in the meantime (๐Ÿ˜„) so might need some work to rebase it.

YuWang916 commented 3 months ago

@tdoublep Thank you! I currently have a workaround to tokenize and truncate beforehand and pass to the parameter prompt_token_ids. But would be nice to have this functionality in the vllm engine!

DarkLight1337 commented 3 months ago

I think #3512 should make this easier by using the same tokenizer for LLMEngine and OpenAIServing.