Extremely long time taken for comparison of codes

jackswl commented 2 months ago

Hi all,

Thanks for the wonderful work.

I am currently running code_bert_score to evaluate the similarity between generated code and 'correct' code. However, it just takes way too long locally. Is there a way for it to speedup (i.e. using GPU or something) on MacOS? Are you able to let me know where I can optimize the code? Is there some specific settings I have to update for the code to run faster? Thanks!

import code_bert_score
import pandas as pd

rp_values = [1]

for rp in rp_values:
    CSV_PATH = f'xxx'
    codebertdf = pd.read_csv(CSV_PATH)

    codebertdf['generated_output'] = codebertdf['generated_output'].str.strip()

    predictions = codebertdf['generated_output'].tolist()
    refs = codebertdf['actual_output'].tolist()

    # Calculate BERT scores
    P, R, F3, F1 = code_bert_score.score(cands=predictions, refs=refs, lang='python')

    # Add scores to DataFrame
    codebertdf['P'] = P
    codebertdf['R'] = R
    codebertdf['F3'] = F3
    codebertdf['F1'] = F1

    # Export DataFrame
    codebertdf.to_csv(f'/xxx', index=False)

urialon commented 2 months ago

Hi Jack, Thank you for your interest in our work!

Yes, a GPU will definitely speed this up. You can also use a Google colab with a GPU.

Best, Uri

On Fri, Aug 30, 2024 at 05:18 Jack Shi Wei Lun @.***> wrote:

Hi all,

Thanks for the wonderful work.

I am currently running code_bert_score to evaluate the similarity between generated code and 'correct' code. However, it just takes way too long locally. Is there a way for it to speedup (i.e. using GPU or something) on MacOS? Are you able to let me know where I can optimize the code? Is there some specific settings I have to update for the code to run faster? Thanks!

import code_bert_score import pandas as pd

rp_values = [1]

for rp in rp_values: CSV_PATH = f'xxx' codebertdf = pd.read_csv(CSV_PATH)
codebertdf['generated_output'] = codebertdf['generated_output'].str.strip()

predictions = codebertdf['generated_output'].tolist()
refs = codebertdf['actual_output'].tolist()

# Calculate BERT scores
P, R, F3, F1 = code_bert_score.score(cands=predictions, refs=refs, lang='python')

# Add scores to DataFrame
codebertdf['P'] = P
codebertdf['R'] = R
codebertdf['F3'] = F3
codebertdf['F1'] = F1

# Export DataFrame
codebertdf.to_csv(f'/xxx', index=False)
— Reply to this email directly, view it on GitHub https://github.com/neulab/code-bert-score/issues/10, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMEW2TTSWEX6UHZPGLLZUA2EVAVCNFSM6AAAAABNMCVKSOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4TMNRYGYZTGNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jackswl commented 2 months ago

Hi @urialon, great work again.

Yeah, using a google Colab (with Cuda GPU) will speed it up. However, I am intending to run it locally on my macbook.

Is there any settings I should do for my code above, for the code_bert_score to somehow tap into my macbook m3 chips? The time taken to compute under 'mps' and 'cpu' is the same. Or is running the comparison not possible using apple silicon chips?

Thanks!

neulab / code-bert-score

Extremely long time taken for comparison of codes #10