Llama 3 Nonsensical Output for Long Context Length (above 4k)

mnuppnau commented 4 months ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

The following code demonstrates the issue:


from langchain_community.llms import LlamaCpp
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate
from langchain.chains import LLMChain

DEVICE_TYPE = "cuda" if torch.cuda.is_available() else "cpu"
SHOW_SOURCES = True
print("device : ", DEVICE_TYPE)

template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are an expert at summarizing long and unstructured notes<|eot_id|><|start_header_id|>user<|end_header_id|>
    Please provide a summary of the following text: \n\n {content} \n\n <|eot_id|><|start_header_id|>assistant<|end_header_id|>
    """

prompt = PromptTemplate.from_template(template=template)

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

n_gpu_layers = -1  # The number of layers to put on the GPU. The rest will be on the CPU. If you don't know how many layers there are, you can use -1 to move all to GPU.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
n_ctx = 8192

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="./models/Meta-Llama-3-70B-Instruct.Q4_K_M.gguf",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    n_ctx=n_ctx,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

text = "Excision of limbal dermoids. We reviewed the clinical files of 10 patients who had undergone excision of unilateral epibulbar limbal dermoids. Preoperatively, all of the affected eyes had worse visual acuity (P less than .02) and more astigmatism (P less than .01) than the contralateral eyes. Postoperatively, every patient was cosmetically improved. Of the eight patients for whom both preoperative and postoperative visual acuity measurements had been obtained, in six it had changed minimally (less than or equal to 1 line), and in two it had improved (less than or equal to 2 lines). Surgical complications included persistent epithelial defects (40%) and peripheral corneal vascularization and opacity (70%). These complications do not outweigh the cosmetic and visual benefits of dermoid excision in selected patients. Bells palsy. A diagnosis of exclusion. In cases of acute unilateral facial weakness, a careful and systematic evaluation is necessary to identify the cause. Idiopathic facial paralysis (Bells palsy) is a diagnosis of exclusion. It is also the most common cause of unilateral facial weakness seen by primary care physicians. The most important aspect of initial treatment is eye protection. Administration of systemic oral corticosteroids may lessen severity and duration of symptoms. Retained endobronchial foreign body removal facilitated by steroid therapy of an obstructing, inflammatory polyp. Oral and topical steroids were used to induce regression in an inflammatory, obstructing endobronchial polyp caused by a retained foreign body. The FB (a peanut half), which had been present for over six months, was then able to be easily and bloodlessly retrieved with fiberoptic bronchoscopy.Recurrent buccal space abscesses: a complication of Crohns disease. A patient is described with generalized gastrointestinal involvement by Crohns disease. Symptoms of recurrent ulceration and mucosal tags are well-described oral manifestations of Crohns disease; however, in our patient recurrent facial abscesses, which required extraoral drainage, also developed. This complication has not previously been reported. Intracranial fibromatosis. Fibromatoses are uncommon infiltrative lesions affecting musculoaponeurotic structures, most often of the limbs and trunk. Lesions involving the cranial cavity are rare and require the same aggressive surgical management as elsewhere in the body. This case illustrates their clinical and neuroradiological features and underscores the necessity for aggressive resection to avoid recurrence. The literature is reviewed. The effect of intrathecal morphine on somatosensory evoked potentials in awake humans. Although the effect of systemic opioids on somatosensory evoked potentials has been well described, little is known about the interaction between intrathecally administered opioid analgesics and somatosensory evoked potentials. Accordingly, the influence of intrathecally administered morphine on posterior tibial nerve somatosensory cortical evoked potentials (PTSCEPs) was investigated in 22 unpremedicated, awake, neurologically normal patients scheduled to undergo elective abdominal or pelvic procedures. Patients were randomly assigned to receive either preservation-free intrathecal morphine sulfate (ITMS) or placebo. After baseline PTSCEP, heart rate and, mean blood pressure were recorded, ITMS (15 micrograms.kg-1) was injected via standard dural puncture with the patient in the lateral position. PTSCEPs, heart rate, and mean blood pressure were recorded again at 5, 10, 20, 30, 60, 90, and 120 min. Control patients were treated identically (including position, sterile preparation, and subcutaneous tissue infiltration with local anesthetic), except for lumbar puncture, and were unaware of their randomization. Before administration of ITMS, PTSCEP P1, N1, P2, N2, and P3 latencies were 39.4 +/- 3.2, 47.6 +/- 3.9, 59.2 +/- 3.2, 70.4 +/- 3.7, and 84.6 +/- 5.5 ms, (mean +/- standard deviation), respectively. The corresponding P1-N1, N1-P2, and P2-N2 amplitudes were 2.4 +/- 1.1, 2.4 +/- 1.1, and 2.3 +/- 0.9 microV, respectively. There were no significant changes over time between the control and ITMS groups. PTSCEPs resulting from left-sided stimulation were not different from those elicited by right-sided stimulation. All ITMS patients had intense postoperative analgesia for at least 24 h. It is concluded that ITMS does not affect PTSCEP waveforms in the 35-90 ms latency range during the awake state. The 29th Rovenstine lecture: clinical challenges for the anesthesiologist. In conclusion, I hope that my comments have reaffirmed your biases or, even more importantly, stimulated you to think in a different way about the information explosion in our specialty and medicine in general. I believe our specialty is in a golden era that will benefit from the past and be nourished by new discoveries and understanding. We as clinicians must accept the challenge of recognizing what new information deserves incorporation into our practice, what old information deserves to be sustained, and what merits new scrutiny and perhaps should be discarded. If I had one wish, it would be that anesthesiologists would never lose their zeal to be students--their thirst for new information--as the continuum of anesthesia education is indeed a life-long process. That wish, ladies and gentlemen, is my challenge to all anesthesiologists. Mortality in patients treated with flecainide and encainide for supraventricular arrhythmias. In a recent clinical trial, the class Ic antiarrhythmic drugs encainide and flecainide were found to be associated with an increased mortality risk in patients with new myocardial infarction and ventricular arrhythmias. The purpose of this study was to assess whether an increased mortality risk also accompanied the use of these drugs to treat patients with supraventricular arrhythmias. Data were obtained from the respective pharmaceutical sponsors on the mortality observed with each drug in United States and foreign protocols enrolling patients with supraventricular arrhythmias. Mortality in the encainide population (343 patients) and the flecainide population (236 patients) was compared with that in a research arrhythmia clinic, the Duke population (154 patients). Nine deaths occurred in the combined encainide-flecainide population and 10 deaths occurred in the Duke population; the follow-up periods averaged 488 days and 1,285 days, respectively. The 6-year survival functions of these 2 populations, estimated by the Kaplan-Meier technique, did not differ significantly (p = 0.62). The hazard ratio for the combined encainide-flecainide population relative to the Duke population was estimated to be 0.6 with a 95% confidence interval of 0.2, 1.7. These descriptive comparisons did not demonstrate any excess mortality when flecainide and encainide were used in patients with supraventricular arrhythmias. Approaches to immunotherapy of cancer: characterization of lymphokines as second signals for cytotoxic T-cell generation. Lymphokines, the soluble molecules produced by cells of the immune system, regulate cell-cell interactions and, consequently, the functional status of the immune system. Altering immunoregulatory pathways with lymphokines in vivo may provide a mechanism for controlling a variety of immunologic disorders. Although normally produced in vivo in very small quantities, the widespread availability of recombinant lymphokines has made it possible to study the molecular signals involved in production of lymphocyte effectors with activity against tumor. For example, interleukin-2-based cancer immunotherapy programs have, in certain clinical situations, suggested that immunologic intervention can influence the regression of metastatic cancer. Ultimately the successful application of these biologic agents requires an understanding of the interaction between the immune system and tumor on a molecular level. To induce a given biologic effect, it is necessary both to classify the required lymphokines and to identify the relevant effector cell populations. This review will examine the progress made in identifying the requirements for lymphokine-induced cytotoxic T-lymphocyte function. Retinal artery obstruction and atheromas associated with non-Hodgkins large cell lymphoma (reticulum cell sarcoma). A 71-year-old woman developed branch retinal artery obstruction as the presenting manifestation of a large cell non-Hodgkins lymphoma. Multifocal chorioretinal scars were present in the same eye. She experienced progressive visual loss accompanied by development of multiple yellow retinal arterial wall plaques, extension of retinal opacification into other quadrants, and increasing vitreous cellular infiltration. Clinical diagnoses included branch retinal arterial obstruction caused by toxoplasmosis retinitis, multifocal choroiditis and panuveitis simulating the presumed ocular histoplasmosis syndrome, vitiliginous chorioretinitis, and the acute retinal necrosis syndrome. Four months after onset, the right eye was blind and was enucleated. Histopathologic examination revealed extensive lymphomatous infiltration and necrosis of the retina and optic nerve. The retinal arteries were partly obstructed by lymphomatous infiltration and atheromas. Subsequently, the left eye and central nervous system were involved by lymphoma. The tonic pain-related behaviour seen in mononeuropathic rats is modulated by morphine and naloxone. This study investigated the sensitivity to pharmacological manipulations of a rating method, adapted from the formalin test, to measure the tonic component of the pain-related behaviour induced by creating a peripheral mononeuropathy with 4 loose ligatures around the common sciatic nerve. Although the adequacy of opioid substances in alleviating neuropathic pain is highly controversial, the effects of morphine (1 mg/kg i.v.) and naloxone (1 mg/and 3 micrograms/kg i.v.) were tested 1-2 weeks after the nerve ligatures were established, when pain-related behaviours were well developed. Morphine (1 mg/kg i.v.) induced a potent and prolonged decrease in the pain-rating score at week 2 after surgery. Either at week 1 or week 2, naloxone elicited a bidirectional dose-dependent action: a further increase in the pain-rating score with the high dose (1 mg/kg i.v.), and a paradoxical decrease in the score with the low dose of 3 micrograms/kg i.v. These effects are comparable to those already described in several rat models of inflammatory pain and, in the same model of neuropathy, using a phasic nociceptive test, the measure of the vocalization to paw pressure. A few differences in the effects of naloxone on tonic and phasic pain are noted and discussed. Examination of cardiorespiratory changes during upper gastrointestinal endoscopy. Comparison of monitoring of arterial oxygen saturation, arterial pressure and the electrocardiogram. Critical events including hypoxaemia, arrhythmias and myocardial ischaemia may occur more frequently during endoscopic procedures than during anaesthesia. A study was undertaken to assess the cardiovascular changes and to evaluate suitable monitoring techniques to detect critical events during sedation and endoscopy. Twenty patients scheduled to undergo a prolonged endoscopic procedure which required deep sedation were studied. Continuous recordings of electrocardiogram, heart rate and arterial oxygen saturation were made and arterial pressure was recorded at one-minute intervals. The study commenced immediately before administration of sedatives, continued for the duration of the examination and for one hour following the examination. Oxygen saturation decreased in all patients during the examination to a mean of 82.9% (SD 11.9), and remained below baseline for the duration of the examination and into the recovery period. Statistically significant increases and reductions of systolic arterial pressure and rate-pressure product were found during the procedures compared with baseline values recorded before administration of sedatives. Sixteen of the 20 patients developed tachycardia during the examination. Ten patients developed ectopic foci which were supraventricular, ventricular or both in origin. Electrocardiogram changes resolved during the recovery period. Myocardial ischaemia was assessed by S-T segment depression and a significant correlation was found between S-T segment depression and hypoxaemia, although the magnitude of the S-T depression was small and may not have been detected clinically. No correlation was found between S-T segment depression and arterial pressure, heart rate or rate-pressure product. Hepatic transmethylation and blood alcohol levels. Golden Syrian hamsters that have elevated hepatic alcohol dehydrogenase activity were divided into four groups and group-fed on four different liquid diets for five weeks. Group I was fed a control diet formulated for hamsters. Group II was fed the control diet containing 20 micrograms of 4 methylpyrazole per litre. Group III was fed the hamster ethanol liquid diet (ethanol amounting to 36% of total calories). Group IV was fed the ethanol diet to which 4-methylpyrazole (20 micrograms/litre) was added. Groups I, II and III were group-fed the amount consumed by Group IV on a daily basis. Upon killing the animals, blood alcohol levels were found to be elevated in Group IV but not in Group III. Hepatic methionine synthetase (MS) was inhibited in Group IV. Betaine-homocysteine methyltransferase was induced in this group to compensate for the MS inhibition and liver betaine was lowered reflecting this induction. None of these changes were seen in Group III. Since none of the animals showed an aversion to their respective diets and gained weight normally, these data indicate that it was the elevated blood levels of ethanol rather than nutritional factors that were related to the changes in methionine metabolism. Memory T cells represent the predominant lymphocyte subset in acute and chronic liver inflammation. T cells can be divided into two main phenotypic subpopulations-i.e., the CD45RA-positive (2H4-positive) naive subset and the CD45RO-positive (UCHL1-positive) memory subset. In light of this recent functional reinterpretation of T-lymphocyte subpopulations, we reinvestigated the composition of the inflammatory infiltrate in liver biopsy specimens from patients with acute and chronic hepatitis. In normal liver, the few scattered mononuclear cells present in portal tracts and in the intralobular parenchyma consisted of both CD45RA-positive (2H4-positive) naive and CD45RO-positive (UCHL1-positive) memory T cells. In inflammatory liver diseases, portal tract and periportal and intralobular areas of inflammation consisted virtually only of CD45RO-positive (UCHL1-positive) memory T cells, which strongly expressed the CDw29 (4B4) antigen, and the adhesion molecules LFA-1, CD2, LFA-3, CD44 and VLA-4 and the activation marker human leukocyte antigen-DR. These results indicate that activated memory T cells represent the predominant subpopulation of lymphocytes in areas of liver inflammation. Memory T cells strongly express various homing receptors and adhesion molecules, which probably allow them to accumulate at inflammatory sites and to strengthen interaction with target cells. Furthermore, the increased number of memory T cells with enhanced interferon-gamma production in areas of liver inflammation may contribute to the maintenance and up-regulation of immune responses occurring in inflammatory liver diseases. Inflammatory properties of neutrophil-activating protein-1/interleukin 8 (NAP-1/IL-8) in human skin: a light- and electronmicroscopic study. Neutrophil-activating protein-1/interleukin 8 (NAP-1/IL-8), purified to homogeneity from lipopolysaccharide-stimulated human peripheral blood monocytes, was injected intracutaneously into human skin. Sequential biopsy specimens were taken in order to investigate the sequence of ultrastructural changes induced by the cytokine. Whereas intracutaneous injection of 100 ng of NAP-1/IL-8 per site caused no macroscopic changes, by histology infiltration with polymorphonuclear leukocytes (PMN) and monocytes was present within 1 h and increased at 3 and 5 h. No lymphocyte infiltration was noted. The first ultrastructural changes (30 min) consisted of the presence of cytoplasmic 7-nm microfilament bundles, as well as numerous protrusions of the luminal plasma membrane of endothelial cells (EC). As a striking feature, multiple 100- to 160-nm electron lucent vesicles could be observed in the EC cytoplasm. These structures differed from plasmalemmal vesicles and suggest secretory activity. When PMN and monocytes appeared in the vascular lumen (1 h and later), the number of 100-160-nm electron-lucent vesicles had decreased significantly. In contrast to C5a-injected skin sites, mast cell degranulation was absent. Bronchogenic carcinoma with chest wall invasion. Bronchogenic carcinoma with chest wall involvement continues to present a major clinical challenge. We have treated 52 patients since 1973, excluding those with superior sulcus tumors. There were 37 male and 15 female patients with an average age of 62.9 years. Chest pain was an initial symptom in 37%. All patients had negative mediastinoscopy results. Squamous cell carcinoma was present in 53% and adenocarcinoma in 35%. The median number of ribs resected was two (range, one to six), and only 2 patients required chest wall reconstruction. Pathologic staging was T3 N0 M0 in 83% and T3 N1 M0 in 17%. Operative mortality was 3.8%. Absolute 5-year survival was 26.3%. Patients who had N1 disease had a 5-year survival of only 11%. Radiation therapy was employed in 46% for positive nodes or close margins. Bronchogenic carcinoma with chest wall invasion remains potentially curable if N2 nodes are not involved. The role of radiation therapy has not been clearly defined. Morbidity and mortality should be minimal. Electronic weaponry--a question of safety [published erratum appears in Ann Emerg Med 1991 Sep;20(9):1031] Electronic weapons represent a new class of weapon available to law enforcement and the lay public. Although these weapons have been available for several years, there is inadequate research to document their safety or efficacy. Two of the most common, the TASER and the stun gun, are reviewed. The electronic weapon was initially and still is approved by the US Consumer Product Safety Commission; its approval was based on theoretical calculations of the physical effects of damped sinusoidal pulses, not on the basis of animal or human studies. These devices are widely available and heavily promoted, despite limited research into their safety or efficiency and despite recent animal studies documenting their potential for lethality. Operative management of acoustic neuromas: the priority of neurologic function over complete resection. The objective of surgical management of acoustic tumors is to remove them entirely and preserve facial nerve function and hearing when possible. A dilemma arises when it is not possible to remove the entire tumor without incurring additional neurologic deficits. Twenty patients who underwent intentional incomplete surgical removal of an acoustic neuroma to avoid further neurologic deficit were retrospectively reviewed. They were divided into a subtotal group (resection of less than 95% of tumor) and a near-total group (resection of 95% or more of tumor) and were followed yearly with either computed tomography or magnetic resonance imaging. The subtotal group was planned and consisted of elderly patients (mean age, 68.5 years) with large tumors (mean, 3.1 cm). The near-total group consisted of younger patients (mean age, 45.8 years) and smaller tumors (mean, 2.3 cm). The mean length of followup for all patients was 5.0 years. Ninety percent of patients had House grade I or II facial function post-operatively. Radiologically detectable tumor regrowth occurred in only one patient, who was in the subtotal resection group. Near-total resection of acoustic tumor was not associated with radiologic evidence of regrowth of tumor for the period of observation. Within the limits of the follow-up period of this study, subtotal resection of acoustic neuroma in elderly patients was not associated with clinically significant recurrence in most patients and produced highly satisfactory rates of facial preservation with low surgical morbidity. Torsades de pointes occurring in association with terfenadine use. Torsades de pointes is a form of polymorphic ventricular tachycardia that is associated with prolongation of the QT interval. Although found in many clinical settings, torsades de pointes is most often drug induced. This report describes the first association (exclusive of drug overdose) of symptomatic torsades de pointes occurring with the use of terfenadine in a patient who was taking the recommended prescribed dose of this drug in addition to cefaclor, ketoconazole, and medroxyprogesterone. Measured serum concentrations of terfenadine and its main metabolite showed excessive levels of parent terfenadine and proportionately reduced concentrations of metabolite, suggesting inhibition of terfenadine metabolism. We believe that a drug interaction between terfenadine and ketoconazole resulted in the elevated terfenadine levels in plasma and in the cardiotoxicity previously seen only in cases of terfenadine overdose. Asymptomatic celiac and superior mesenteric artery stenoses are more prevalent among patients with unsuspected renal artery stenoses. The prevalence of unsuspected renal artery stenosis among patients with peripheral vascular disease has been reported to be as high as 40%, but the prevalence of asymptomatic celiac and superior mesenteric artery stenoses in these patients is not known. The biplane aortograms of 205 male patients who were military veterans and had aneurysms or occlusive disease were independently reviewed, and medical records were studied to determine associated coronary disease, risk factors, and patient outcome. Fifty-six patients (27%) had a 50% or greater stenosis in the celiac or superior mesenteric artery, and seven patients (3.4%) had significant stenoses in both mesenteric arteries. Patients with celiac or superior mesenteric artery stenoses were older (p = 0.002) and had a higher prevalence of hypertension (p = 0.029) than those without significant mesenteric stenoses. Fifty of the 205 patients had significant renal artery stenoses, and 20 had advanced (greater than 75% diameter loss) renal stenoses. Ten of the 20 patients (50%) with advanced renal stenoses had a concomitant celiac artery stenosis, compared to 40 of the 185 patients (22%) who did not have advanced renal stenoses (p = 0.011). In the present study asymptomatic celiac or superior mesenteric artery stenoses were common among male veterans evaluated for peripheral vascular disease, but the prevalence of significant stenoses in both the celiac and superior mesenteric arteries was low. The prevalence of significant celiac stenosis was higher in patients with advanced (greater than 75%) renal artery stenoses who might be considered for prophylactic renal revascularization. Lateral aortography with evaluation of the celiac artery is always appropriate in these patients. Brain-stem auditory evoked responses in 56 patients with acoustic neurinoma. The brain-stem auditory evoked responses (BAERs) recorded from 56 patients with acoustic neurinomas were analyzed. Ten of the patients had intracanalicular tumors and 46 had extracanalicular tumors. It was possible to obtain BAERs following stimulation of the affected side in 28 patients and after stimulation of the unaffected side in all 56. Five patients (11%) had normal BAERs following stimulation of both sides; three of these patients had intracanalicular tumors. Among BAERs obtained following stimulation of the affected ear, the mean interpeak latency (IPL) for peaks I to III associated with extracanalicular tumors was significantly prolonged relative to controls (p less than 0.001), and linear regression analysis revealed a significant positive correlation between tumor size and IPL of peaks I to III (p less than 0.05). Analysis of the 56 BAERs recorded after stimulation of the unaffected side revealed a significant positive correlation between the IPLs of peaks III to V and tumor size (p less than 0.001). This correlation was not strengthened when accounting for the degree of brain-stem compression. Finally, evidence of preserved function within the auditory pathway, even in the presence of partial hearing loss, is presented. This finding suggests that more patients might benefit from surgical procedures that spare the eighth cranial nerve. First heterotransplantation of a human carcinoid tumor into nude mice. The first successful heterotransplantation of a human carcinoid tumor into nude mice is reported. CSH, a voluminous hepatic metastasis of a primary bronchial carcinoid tumor (CSB) was resected and transplanted into three irradiated nude (Swiss-nu/nu) mice both by subcutaneous (SC) and intramuscular (IM) routes; the success rate was five of six. Heterotransplanted tumors took 4 to 5 months to appear in the mice and 1 month to attain a width of 0.5 cm. Both human and mouse tumors (named CSH-SC and CSH-IM) were studied by light and electron microscopy. They were Grimelius-positive, neuron-specific enolase-positive, and bombesin-negative by immunocytochemistry. Furthermore, CSH-SC cells presented characteristic (pear-shaped, rod-shaped, or tadpole-shaped) neurosecretory granules. Although CSB and CSH were slightly serotonin positive by immunocytochemistry, only a few serotonin-positive cells were found in CSH-SC and none in CSH-IM, suggesting partial loss of differentiation or an increase in serotonin catabolism during transplantation. A prospective evaluation of the immediate reproducibility of the signal-averaged ECG. The purpose of this investigation was to prospectively evaluate the immediate reproducibility of the signal-averaged electrocardiogram (SAECG). A total of 114 patients undergoing evaluation for ventricular arrhythmias were enrolled in this protocol. Two consecutive SAECGs (40 Hz bidirectional high-pass filtering with a computer-automated system) were performed 10 minutes apart. Abnormal SAECG parameters were defined as (1) vector QRS duration more than 120 msec, (2) terminal root mean square (RMS) voltage less than 20 microV, and (3) low-amplitude signal (LAS) duration more than 40 msec. An SAECG was defined as abnormal if at least one vector parameter was abnormal. There was close correlation between vector parameters during the two SAECG observations: QRS duration had the highest reproducibility (r2 = 0.97, p less than 0.001) followed by terminal RMS voltage (r2 = 0.92, p less than 0.001), and LAS duration (r2 = 0.90, p less than 0.001). The mean (+/- SD) percentage of change between the two recordings was 2% +/- 2% of the QRS duration, 13% +/- 22% for terminal RMS voltage, and 7% +/- 11% for LAS duration. The reproducibility of an initially normal SAECG was 92% and of an initially abnormal SAECG, 96%. Seventeen patients (15%) had a change in one of the three vector parameters between the two recordings. There were no clinically significant differences between the 17 patients in whom the SAECG was nonreproducible and the 97 patients in whom the SAECG was reproducible. However, reproducibility was significantly higher in patients with an initially normal versus an initially abnormal SAECG (92% vs 76%, p = 0.03). Hypertension, lipoprotein(a), and apolipoprotein A-I as risk factors for stroke in the Chinese. We analyzed the serum concentrations of lipids and lipoproteins and the prevalence of other risk factors in a case-control study of 304 consecutive Chinese patients with acute stroke (classified as cerebral infarction, lacunar infarction, or intracerebral hemorrhage) and 304 age- and sex-matched controls. For all strokes we identified the following risk factors: a history of ischemic heart disease, diabetes mellitus, or hypertension; the presence of atrial fibrillation or left ventricular hypertrophy; a glycosylated hemoglobin A1 concentration of greater than 9.1%; a fasting plasma glucose concentration 3 months after stroke of greater than 6.0 mmol/l; a serum triglyceride concentration 3 months after stroke of greater than 2.1 mmol/l; and a serum lipoprotein(a) concentration of greater than 29.2 mg/dl. We found the following protective factors: a serum high density lipoprotein-cholesterol concentration of greater than 1.59 mmol/l and a serum apolipoprotein A-I concentration of greater than or equal to 106 mg/dl. The patterns of risk factors differed among the three stroke subtypes. When significant risk factors were entered into a multiple logistic regression model, we found a history of hypertension, a high serum lipoprotein(a) concentration, and a low apolipoprotein A-I concentration to be independent risk factors for all strokes. The attributable risk for hypertension was estimated to be 24% in patients aged greater than or equal to 60 years. In this population, in which cerebrovascular diseases are the third commonest cause of mortality, identification of risk factors will allow further studies in risk factor modification for the prevention of stroke. Prevalence of air bronchograms in small peripheral carcinomas of the lung on thin-section CT: comparison with benign tumors. Despite improved techniques--such as bronchoscopy and percutaneous needle biopsy--to evaluate pulmonary nodules, there are still many cases in which surgical resection is necessary before carcinoma can be differentiated from benign lesions. The present study was undertaken to determine if the presence of an air bronchogram or air bronchiologram (patent visible bronchus or bronchiole) is useful in distinguishing small lung cancers from benign nodules. Thin-section chest CT scans were obtained in patients with 20 peripheral lung cancers less than 2 cm in diameter (18 adenocarcinomas, one squamous cell carcinoma, and one large cell carcinoma) and 20 small benign nodules (eight hamartomas, seven tuberculomas, two foci of aspergillosis, one focus of cryptococcosis, one chronic focal interstitial pneumonitis, and one plasma cell granuloma). The images were compared with regard to the patency of any bronchus or bronchiole within the lesions. After surgical resection, the specimens were inflated with agar and sectioned transversely to correlate gross morphology and low-power histologic sections with the CT appearance. An air bronchogram or air bronchiologram was seen in the tumors on 65% of CT scans and 70% of histologic sections. Benign nodules had a patent bronchus or bronchiole on CT scans and histologic sections in only one case (5%). These findings suggest that the presence of an air bronchogram in a lung nodule is a useful finding to help differentiate adenocarcinomas from benign lesions. Long-term spinal administration of morphine in cancer and non-cancer pain: a retrospective study. Records of 313 patients who had been treated with spinal morphine via an implanted Port-A-Cath were reviewed. In 284 cases the Port-A-Cath was implanted for epidural delivery of morphine in patients with cancer-related pain. These patients were treated for a mean of 96 (range 1-1215) days. There was a wide variation in dose requirements, minimum daily dose ranging from 0.5 to 200 mg and maximum daily dose from 1 to 3072 mg. However, there was no clear trend to increasing dose as period of epidural morphine administration increased. The most frequent complications were pain on injection (12.0% incidence), occlusion of the portal system (10.9%), infection (8.1%) and leakage of administered morphine such that it did not all reach the epidural space (2.1%). In all but 1 case infections were limited to the area around the portal or along the catheter track. All infections resolved without sequelae following removal of the portal and/or administration of antibiotics. In 17 patients Port-A-Caths were implanted for the intrathecal delivery of morphine to control cancer-related pain. These patients also exhibited wide variations in morphine dose requirements. Port-A-Caths were also implanted for delivery of spinal morphine in 12 patients with chronic pain which was not related to cancer and which failed to respond to other therapies. These patients were treated for a mean of 155 (range 2-575) days. Port-A-Caths were removed from 7 of these patients, primarily due to infection (2 cases) and inadequate pain relief and pain on injection (2 cases). Real-time ultrasound for the detection of deep venous thrombosis. PURPOSE: Accurate diagnosis of deep venous thrombosis (DVT) is a clinical problem in emergency practice. A prospective trial was conducted comparing real-time ultrasound with contrast venography in the diagnosis of proximal DVT. METHODS: Seventy patients whose clinical presentations mandated diagnostic evaluation for DVT had real-time ultrasound of the involved leg followed by contrast venography. Initial readings of ultrasound and venography were compared with each other and with final readings to assess reliability of interpretation. RESULTS: Final ultrasound readings agreed with final venogram readings in all patients. Negative initial ultrasound readings agreed with final venogram readings in 56 of 56 patients (negative predictive value, 100%; 95% confidence interval, 94 to 100). Eighteen patients had positive initial ultrasound readings compared with 14 who had positive final venogram readings (positive predictive value, 78%; 95% confidence interval, 55 to 91). CONCLUSION: Negative real-time ultrasonography reliably excludes proximal DVT. Positive ultrasound reliably diagnoses proximal DVT only in experienced hands. Single- versus dual-chamber sensor-driven pacing: comparison of cardiac outputs. Previous studies have shown that single-chamber sensor-driven pacing improves exercise tolerance for patients with chronotropic incompetence. However, long-term single-chamber pacing has a number of inherent problems that limit its usefulness. Although sensor-driven dual-chamber pacing largely obviates the problems inherent with single-chamber sensor-driven pacing, the physiologic benefit of dual-chamber sensor-driven pacing has not yet been demonstrated. Accordingly, the purpose of this study was to compare exercise-induced cardiac output for patients with chronotropic incompetence, after programming their pacemakers to either a simulated sensor-driven single or simulated dual-chamber mode. Cardiac output was measured noninvasively at rest and peak exercise using standard Doppler-derived measurements, obtained in a blinded fashion. At rest the Doppler-derived resting VVI and DDD cardiac outputs were 4.49 +/- 0.3 L/min and 4.68 +/- 0.3 L/min, respectively. At peak exercise, the DDD cardiac output was 5.07 +/- 0.5 L/min, whereas the simulated activity VVI and DDD cardiac outputs were 6.33 +/- 0.6 L/min and 7.41 +/- 0.70 L/min, respectively. Analysis of variance showed that there was an overall significant difference in cardiac output from rest to peak exercise (p less than 0.001). However, only the simulated activity DDD cardiac output was significantly different from its respective control value (p less than 0.05). Thus this study shows for the first time that the addition of rate responsiveness to dual-chamber pacing results in a significant improvement in cardiac output for patients with chronotropic incompetence. FDP D-dimer induces the secretion of interleukin-1, urokinase-type plasminogen activator, and plasminogen activator inhibitor-2 in a human promonocytic leukemia cell line. We studied the effect of fibrinogen degradation products D, E, and D-dimer on a human promonocytic leukemia cell line, NOMO-1. After exposure to a 10(-5)-mol/L fragment D or D-dimer, the cells displayed macrophage-like characteristics, such as adherence to plastic surfaces, and showed approximately a twofold increase in response to the nitroblue tetrazolium reduction test. The secretion of interleukin-1 alpha (IL-1 alpha) into the medium was markedly stimulated by a 10(-5)-mol/L fragment D, E, and D-dimer, whereas a significant increase in IL-1 beta secretion was observed only in D-dimer-stimulated cells. In addition, D-dimer induced a rapid increase in urokinase-type plasminogen activator on day 1 (0.52 +/- 0.02 ng/mL v 0.07 +/- 0.01 ng/mL in the control culture) and a slow increase in plasminogen activator inhibitor-2 on day 5 (3.9 +/- 1.6 ng/mL v 1.2 +/- 0.2 ng/mL in the control culture). An increase in tissue factor (TF) was also demonstrated on the cell surface of NOMO-1 cells exposed to fragment D or D-dimer by indirect immunofluorescence using an anti-TF monoclonal antibody. Scatchard plot analysis showed that fragment D and D-dimer bound to the NOMO-1 cells with a kd of 3.3 nmol/L and 2.7 nmol/L, respectively. These results suggest that fragment D-dimer specifically stimulates cells of monocyte-macrophage lineage to secrete key substances that regulate blood coagulation, fibrinolysis, and inflammation. Stereotactic management of colloid cysts: factors predicting success."

llm_chain = LLMChain(prompt=prompt, llm=llm)

llm_chain.invoke({"content":text}, stop=['<|eot_id|>'])

Error Message and Stack Trace (if applicable)

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
Model metadata: {'tokenizer.chat_template': "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}", 'tokenizer.ggml.eos_token_id': '128001', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'gpt2', 'general.architecture': 'llama', 'llama.rope.freq_base': '500000.000000', 'llama.context_len gth': '8192', 'general.name': 'hub', 'llama.vocab_size': '128256', 'general.file_type': '15', 'llama.embedding_length': '8192', 'llama.feed_forward_length': '28672', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '128000', 'll ama.attention.head_count': '64', 'llama.block_count': '80', 'llama.attention.head_count_kv': '8'}
Using gguf chat template: {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>

'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>

' }}
Using chat eos_token: <|end_of_text|>
Using chat bos_token: <|begin_of_text|>

the a for the " example: this, an or not . ( the more every which the what, example the the other that. each that about the the the a
The to _ more so an in the this to for and an this all a any a a the in the and the to a a such the, to and a all that

llama_print_timings: load time = 1044.99 ms
llama_print_timings: sample time = 26.47 ms / 70 runs ( 0.38 ms per token, 2644.50 tokens per second)
llama_print_timings: prompt eval time = 21789.86 ms / 8122 tokens ( 2.68 ms per token, 372.74 tokens per second)
llama_print_timings: eval time = 4749.77 ms / 69 runs ( 68.84 ms per token, 14.53 tokens per second)
llama_print_timings: total time = 26954.58 ms / 8191 tokens

Description

I'm trying to use langchain, LlamaCpp and LLMChain, to generate output from Meta's new Llama 3 models. I've tried various types of models, all with the same issue. The models perform well on text of token length around 3k and less. When the token length is increased, the output becomes nonsensical. I am able to successfully run llama.cpp main command in interactive mode and get meaningful output when pasting 8k tokens in the terminal.

System Info

I've tried this on various systems, here is one:

System Information

OS: Linux OS Version: #1 SMP PREEMPT_DYNAMIC Fri, 08 Mar 2024 01:59:01 +0000 Python Version: 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801]

Package Information

langchain_core: 0.1.45 langchain: 0.1.16 langchain_community: 0.0.34 langsmith: 0.1.49 langchain_test: Installed. No version info available. langchain_text_splitters: 0.0.1

strnad commented 4 months ago

I had the same problem and I figured out how to fix it. The issue is that, for some odd reason, LangChain has hardcoded default values of rope_freq_scale=1.0 and rope_freq_base=10000 and does not allow llama.cpp to automatically set the appropriate rope values based on the model metadata. Simply set rope_freq_base=500000 and Llama3 will shine again. Now, I am trying to figure out how to prevent LangChain from altering these settings at all.

mnuppnau commented 4 months ago

I had the same problem and I figured out how to fix it. The issue is that, for some odd reason, LangChain has hardcoded default values of rope_freq_scale=1.0 and rope_freq_base=1000 and does not allow llama.cpp to automatically set the appropriate rope values based on the model metadata. Simply set rope_freq_base=50000 and Llama3 will shine again. Now, I am trying to figure out how to prevent LangChain from altering these settings at all.

I've tried to update the script above with various settings including:

model_kwargs = {'rope_freq_base':50000}

llm = LlamaCpp(
    model_path="./models/Meta-Llama-3-70B-Instruct.Q4_K_M.gguf",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    n_ctx=n_ctx,
    model_kwargs=model_kwargs,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

And the output is changed slightly but is still nonsensical. Am I updating the incorrect parameter?

strnad commented 4 months ago

oops, sorry, it should be 500 000, not 50 000 as i incorrectly said before. But that's for the standard ctx size, for different ctx_size we probably need to recalculate the rope parameters

mnuppnau commented 4 months ago

oops, sorry, it should be 500 000, not 50 000 as i incorrectly said before. But that's for the standard ctx size, for different ctx_size we probably need to recalculate the rope parameters

It appears that the rope_freq_base was already set to 500 000. If you look at my output above, it shows 'llama.rope.freq_base': '500000.000000'. Here is additional output information:

llama_model_loader: - kv   0:                       general.architecture str              = llama                                                                                                                                                                                                        08:27:46 [31/5329]
llama_model_loader: - kv   1:                               general.name str              = hub                                                                                                                                                                                                                            
llama_model_loader: - kv   2:                           llama.vocab_size u32              = 128256                                                                                                                                                                                                                         
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192                                                                                                                                                                                                                           
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 8192                                                                                                                                                                                                                           
llama_model_loader: - kv   5:                          llama.block_count u32              = 80                                                                                                                                                                                                                             
llama_model_loader: - kv   6:                  llama.feed_forward_length u32              = 28672                                                                                                                                                                                                                          
llama_model_loader: - kv   7:                 llama.rope.dimension_count u32              = 128                                                                                                                                                                                                                            
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 64                                                                                                                                                                                                                             
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 8                                                                                                                                                                                                                              
llama_model_loader: - kv  10:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010                                                                                                                                                                                                                       
llama_model_loader: - kv  11:                       llama.rope.freq_base f32              = 500000.000000                                                                                                                                                                                                                  
llama_model_loader: - kv  12:                          general.file_type u32              = 15                                                                                                                                                                                                                             
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2                                                                                                                                                                                                                           
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...                                                                                                                                                                                       
llama_model_loader: - kv  15:                      tokenizer.ggml.scores arr[f32,128256]  = [0.000000, 0.000000, 0.000000, 0.0000...                                                                                                                                                                                       
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...                                                                                                                                                                                       
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...                                                                                                                                                                                                 
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000                                                                                                                                                                                                                         
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128001                                                                                                                                                                                                                         
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...                                                                                                                                                                                       
llama_model_loader: - kv  21:               general.quantization_version u32              = 2                                                                                                                                                                                                                              
llama_model_loader: - type  f32:  161 tensors                                                                                                                                                                                                                                                                              
llama_model_loader: - type q4_K:  441 tensors                                                                                                                                                                                                                                                                              
llama_model_loader: - type q5_K:   40 tensors                                                                                                                                                                                                                                                                              
llama_model_loader: - type q6_K:   81 tensors                                                                                                                                                                                                                                                                              
llm_load_vocab: special tokens definition check successful ( 256/128256 ).                                                                                                                                                                                                                                                 
llm_load_print_meta: format           = GGUF V3 (latest)                                                                                                                                                                                                                                                                   
llm_load_print_meta: arch             = llama                                                                                                                                                                                                                                                                              
llm_load_print_meta: vocab type       = BPE                                                                                                                                                                                                                                                                                
llm_load_print_meta: n_vocab          = 128256                                                                                                                                                                                                                                                                             
llm_load_print_meta: n_merges         = 280147                                                                                                                                                                                                                                                                             
llm_load_print_meta: n_ctx_train      = 8192                                                                                                                                                                                                                                                                               
llm_load_print_meta: n_embd           = 8192                                                                                                                                                                                                                                                                               
llm_load_print_meta: n_head           = 64                                                                                                                                                                                                                                                                                 
llm_load_print_meta: n_head_kv        = 8                                                                                                                                                                                                                                                                                  
llm_load_print_meta: n_layer          = 80                                                                                                                                                                                                                                                                                 
llm_load_print_meta: n_rot            = 128                                                                                                                                                                                                                                                                                
llm_load_print_meta: n_embd_head_k    = 128                                                                                                                                                                                                                                                                                
llm_load_print_meta: n_embd_head_v    = 128                                                                                                                                                                                                                                                                                
llm_load_print_meta: n_gqa            = 8                                                                                                                                                                                                                                                                                  
llm_load_print_meta: n_embd_k_gqa     = 1024                                                                                                                                                                                                                                                                               
llm_load_print_meta: n_embd_v_gqa     = 1024                                                                                                                                                                                                                                                                               
llm_load_print_meta: f_norm_eps       = 0.0e+00                                                                                                                                                                                                                                                                            
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05                                                                                                                                                                                                                                                                            
llm_load_print_meta: f_clamp_kqv      = 0.0e+00                                                                                                                                                                                                                                                                            
llm_load_print_meta: f_max_alibi_bias = 0.0e+00                                                                                                                                                                                                                                                                            
llm_load_print_meta: f_logit_scale    = 0.0e+00                                                                                                                                                                                                                                                                            
llm_load_print_meta: n_ff             = 28672                                                                                                                                                                                                                                                                              
llm_load_print_meta: n_expert         = 0                                                                                                                                                                                                                                                                                  
llm_load_print_meta: n_expert_used    = 0                                                                                                                                                                                                                                                                                  
llm_load_print_meta: causal attn      = 1                                                                                                                                                                                                                                                                                  
llm_load_print_meta: pooling type     = 0                                                                                                                                                                                                                                                                                  
llm_load_print_meta: rope type        = 0                                                                                                                                                                                                                                                                                  
llm_load_print_meta: rope scaling     = linear                                                                                                                                                                                                                                                                             
llm_load_print_meta: freq_base_train  = 500000.0                                                                                                                                                                                                                                                                           
llm_load_print_meta: freq_scale_train = 1                                                                                                                                                                                                                                                                                  
llm_load_print_meta: n_yarn_orig_ctx  = 8192                                                                                                                                                                                                                                                                               
llm_load_print_meta: rope_finetuned   = unknown                                                                                                                                                                                                                                                                            
llm_load_print_meta: ssm_d_conv       = 0                                                                                                                                                                                                                                                                                  
llm_load_print_meta: ssm_d_inner      = 0                                                                                                                                                                                                                                                                                  
llm_load_print_meta: ssm_d_state      = 0                                                                                                                                                                                                                                                                                  
llm_load_print_meta: ssm_dt_rank      = 0                                                                                                                                                                                                                                                                                  
llm_load_print_meta: model type       = 70B 
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 70.55 B
llm_load_print_meta: model size       = 39.59 GiB (4.82 BPW) 
llm_load_print_meta: general.name     = hub
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes  
ggml_cuda_init: found 2 CUDA devices:       
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
llm_load_tensors: ggml ctx size =    1.10 MiB  
llm_load_tensors: offloading 80 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 81/81 layers to GPU
llm_load_tensors:        CPU buffer size =   563.62 MiB
llm_load_tensors:      CUDA0 buffer size = 20038.81 MiB
llm_load_tensors:      CUDA1 buffer size = 19940.67 MiB
...................................................................................................
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: n_batch    = 1024
llama_new_context_with_model: n_ubatch   = 512  
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1

It seems that no matter how adjust rope-freq-base, rope_freq_base, or freq_base, these values printed out do not change.

strnad commented 4 months ago

I also see the correct value in the output. but when I set value rope_freq_base in Llama.cpp constructor (directly, not using kwargs), the behavior of model changes while in console I still see the same value of 500000. Documentation don't label this parameter as optional and it mention the default value (which is not suitable for llama3)

https://api.python.langchain.com/en/latest/llms/langchain_community.llms.llamacpp.LlamaCpp.html#langchain_community.llms.llamacpp.LlamaCpp.rope_freq_base

Dne so 27. 4. 2024 14:32 uživatel mnuppnau @.***> napsal:

oops, sorry, it should be 500 000, not 50 000 as i incorrectly said before. But that's for the standard ctx size, for different ctx_size we probably need to recalculate the rope parameters

It appears that the rope_freq_base was already set to 500 000. If you look at my output above, it shows 'llama.rope.freq_base': '500000.000000'. Here is additional output information:

llama_model_loader: - kv 0: general.architecture str = llama 08:27:46 [31/5329] llama_model_loader: - kv 1: general.name str = hub llama_model_loader: - kv 2: llama.vocab_size u32 = 128256 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 8192 llama_model_loader: - kv 5: llama.block_count u32 = 80 llama_model_loader: - kv 6: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 7: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 8: llama.attention.head_count u32 = 64 llama_model_loader: - kv 9: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 11: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 12: general.file_type u32 = 15 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,128256] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001 llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... llama_model_loader: - kv 21: general.quantization_version u32 = 2 llama_model_loader: - type f32: 161 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors llm_load_vocab: special tokens definition check successful ( 256/128256 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 8192 llm_load_print_meta: n_head = 64 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 80 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 28672 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 70B llm_load_print_meta: model ftype = Q4_K - Medium llm_load_print_meta: model params = 70.55 B llm_load_print_meta: model size = 39.59 GiB (4.82 BPW) llm_load_print_meta: general.name = hub llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128001 '<|end_of_text|>' llm_load_print_meta: LF token = 128 'Ä' ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes llm_load_tensors: ggml ctx size = 1.10 MiB llm_load_tensors: offloading 80 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 81/81 layers to GPU llm_load_tensors: CPU buffer size = 563.62 MiB llm_load_tensors: CUDA0 buffer size = 20038.81 MiB llm_load_tensors: CUDA1 buffer size = 19940.67 MiB ................................................................................................... llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 1024 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1

It seems that no matter how adjust rope-freq-base, rope_freq_base, or freq_base, these values printed out do not change.

— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/langchain/issues/20710#issuecomment-2080547236, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXORV2CCLR6SVZEJMMFTHDY7OLGXAVCNFSM6AAAAABGROD4XKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBQGU2DOMRTGY . You are receiving this because you commented.Message ID: @.***>

mnuppnau commented 4 months ago

I also see the correct value in the output. but when I set value rope_freq_base in Llama.cpp constructor (directly, not using kwargs), the behavior of model changes while in console I still see the same value of 500000. Documentation don't label this parameter as optional and it mention the default value (which is not suitable for llama3) https://api.python.langchain.com/en/latest/llms/langchain_community.llms.llamacpp.LlamaCpp.html#langchain_community.llms.llamacpp.LlamaCpp.rope_freq_base

The following update works now:

rope_freq_base = 500000
max_tokens = 1024

llm = LlamaCpp(
    model_path="./models/Meta-Llama-3-70B-Instruct.Q4_K_M.gguf",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    n_ctx=n_ctx,
    rope_freq_base=rope_freq_base,
    max_tokens=max_tokens,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

Thanks!

strnad commented 4 months ago

i just realized that when i set rope-freq-base and rope_freq_base to zero, it let llama.cpp to automaticaly set it based on the model metadata :smile:

kenkoonwong commented 1 month ago

thanks for this discussion! Setting rope_freq_base significantly helped me. More compliant with system prompt than when not set.

langchain-ai / langchain