hassonlab / 247-encoding

Contains python scripts for performing encoding on 247 data.
0 stars 9 forks source link

clean up code snippet #48

Closed hvgazula closed 1 year ago

hvgazula commented 1 year ago

https://github.com/hassonlab/247-encoding/blob/03e73d281600e34e8a584025f0895b0e2aa93d69/scripts/tfsenc_config.py#L28-L29

I suggest replacing the above two lines with

def clean_lm_model_name(item):
    """Remove unnecessary parts from the language model name.

    Args:
        item (str/list): full model name from HF Hub

    Returns:
        (str/list): pretty model name
    """    
    if isinstance(item, str):
        return item.split("/")[-1]

    if isinstance(item, list):
        return [clean_lm_model_name(i) for i in item]

    print('Invalid input. Please check.')

args.emb_type = clean_lm_model_name(args.emb_type)
args.align_with = clean_lm_model_name(args.align_with)

Sure, it is more than two lines, but at least it is clean. Thoughts, @zkokaja?

zkokaja commented 1 year ago

It looks good. Do we need to use it elsewhere too? Can you provide examples of what the model names look like after running this function?

hvgazula commented 1 year ago

https://github.com/hassonlab/247-pickling/blob/main/scripts/tfsemb_LMBase.py#L30 in pickling

zkokaja commented 1 year ago

Great. e.g. EleutherAI/gpt-neo-1.3B becomes gpt-neo-1.3B. Ready for PR.