Lawma-8b output weird, seems not right

Yixing-Li commented 1 month ago

Actually I'm not sure if the version of lawma-8b model on huggingface is correct. Because the data prepare stage has got some problem (data not available) so that I cannot conduct the evaluation, so I wrote a simple load llama script and just tried inference.

And I got the following output:

Your work seems great, but it just not quite easy to reproduce your repo : )

Yixing-Li commented 1 month ago

My simple script is as follows:

from transformers import AutoModelForCausalLM, AutoTokenizer
import sys
import torch

model_dir = "/data/intern/yixing/llm-exp/interaction/law_lawma/model/lawma-8b"  

tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(model_dir)

def generate_response(input_text):
    inputs = tokenizer(input_text, return_tensors="pt")
    outputs = model.generate(inputs.input_ids, max_length=150, do_sample=True, temperature=0.7)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

input_text = "Suggest a dispute resolution clause for an international sales contract that minimizes litigation risks and favors arbitration.\n"
output = generate_response(input_text)

print("Model Input:", input_text)
print("Model Output:", output)

RicardoDominguez commented 1 month ago

Hi! This model was fine-tuned for legal classification tasks. It's has been fine-tuned only for multiple choice, not on general instructions. This is why it only outputs multiple choice letters (e.g., A, B, C) or numbers.

See the disclaimer in the model page:

What are the Lawma models useful for? We recommend using the Lawma models only for the legal classification tasks that they models were fine-tuned on. The main take-away of our paper is that specializing models leads to large improvements in performance. Therefore, we strongly recommend practitioners to further fine-tune Lawma on the actual tasks that the models will be used for. Relatively few examples --i.e, dozens or hundreds-- may already lead to large gains in performance.

Yixing-Li commented 1 month ago

Oh thanks for your explanation. So it is indeed correct if the model outputs A, B, C. I'm working on a paper and would cite your work. So I would just wait up till you update the dataset (since there is a problem downloading eval data). Thx for your attention again.

RicardoDominguez commented 1 month ago

Here you have a minimal example using your code template

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_dir = "ricdomolm/lawma-8b"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(model_dir)

def generate_response(input_text):
    inputs = tokenizer(input_text, return_tensors="pt")
    outputs = model.generate(inputs.input_ids, max_length=2048, do_sample=False)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

input_text = """Mr. Justice Stevens
delivered the opinion of the Court.
On September 2, 1974, following almost a decade of studying the Nation’s private pension plans, Congress enacted the Employee Retirement Income Security Act of 1974 (ERISA), 88 Stat. 829, 29 U. S. C. § 1001 et seq. As a predicate for this comprehensive and reticulated statute, Congress made detailed findings which recited, in part, “that the continued well-being and security of millions of employees and their dependents are directly affected by these plans; [and] that owing to the termination of plans before requisite funds have been accumulated, employees and their beneficiaries have been deprived of anticipated benefits....” ERISA § 2 (a), 29 U. S. C. § 1001 (a). As one of the means of protecting the interests of beneficiaries, Title IY of ERISA created a plan termination insurance program that became effective in successive stages. The question in this case is whether former employees of petitioner with vested interests in a plan that terminated the day before much of ERISA became fully effective are covered by the insurance program notwithstanding a provision in the plan limiting their benefits to the assets in the pension fund.
Stated in statutory terms, the question is whether a plan provision that limits otherwise defined, vested benefits to the amounts that can be provided by the assets of the fund prevents such benefits from being characterized as “nonforfeitable” within the meaning of § 4022 (a) of ERISA, 29 U. S. C. § 1322 (a). If the benefits are “nonforfeitable,” they are insured by the Pension Benefit Guaranty Corporation (PBGC) under Title IV. And if insurance is payable to the former employees, the PBGC has a statutory right under § 4062 (b) to reimbursement from the employer. It was petitioner’s interest in avoiding liability for such reimbursement that gave rise to this action for declaratory and injunc-tive relief.
The relevant facts are undisputed. In 1960, pursuant to a collective-bargaining agreement, petitioner established a pension plan covering employees represented by the respondent union at its

Question: What is the issue area of the decision?
A. Criminal Procedure
B. Civil Rights
C. First Amendment
D. Due Process
E. Privacy
F. Attorneys
G. Unions
H. Economic Activity
I. Judicial Power
J. Federalism
K. Interstate Relations
L. Federal Taxation
M. Miscellaneous
N. Private Action
Answer:"""

output = generate_response(input_text)
print(output)

Yixing-Li commented 1 month ago

Got it, thank you so much. Btw, is there any possibility that your lawma could complete the question answering tasks like :

Prompt:  blabla. What is the issue area of the decision?
Output: Criminal Procedure

I know it's quite impossible cause you trained lawma on classification tasks. But i figure might you have trained another version of lawma or other possibilities. Thanks!

RicardoDominguez commented 1 month ago

Does each of your questions of interest have a number of possible answer choices, or are they open ended questions?

If it is open ended questions, then I am afraid this is currently not possible.

Yixing-Li commented 1 month ago

Yeah the questions have a number of possible answer choices, just like your case "Criminal Procedure, or Civil Rights, or First Amendment, etc". But prefer direct answering words like "Criminal Procedure", not "A". Is there any possibility?

(It seems weird but this is necessary lol. Thanks for your attention)

RicardoDominguez commented 1 month ago

No, open ended generation is currently not possible. However, if you have a set of possible answer choices, you can easily turn it into a multiple choice question and get the answer from the model. Here is an example:

def generate_response(input_text):
    inputs = tokenizer(input_text, return_tensors="pt")
    outputs = model.generate(inputs.input_ids, max_length=2048, do_sample=False)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

def process_multiple_choice(question, options):
    assert len(options) <= 26, "Too many options"
    text_input = 'Question: ' + question + '\n'
    for i, option in enumerate(options):
        text_input += f'{chr(65+i)}. {option}\n'
    text_input += 'Answer:'

    response = generate_response(text_input)[-1]
    answer = options[ord(response) - 65]
    return answer

question = "Answer Civil Rights"
options = ['Criminal Procedure', 'Civil Rights', 'First Amendment']
output = process_multiple_choice(question, options)

print('Question:', question)
print('Answer:', output)

which leads to

Question: Answer Civil Rights
Answer: Civil Rights

Yixing-Li commented 1 month ago

Yeah thanks so much for your coding and time. But I didnt mean that transfering from option ids like "A" into "Civil Rights", but llama model directly output "Civil Rights". This is maybe also referred to as open ended generation, even though there are limited choices.

Thanks for your attention. As you said, it is indeed not quite possible to let lawma model directly output words like "Civil Rights". Unfortunatly, this is strongly required by the paper I'm working on lol.

Thanks for your attention and I'm closing the issue.

socialfoundations / lawma

Lawma-8b output weird, seems not right #3