tensorflow / models

Models and examples built with TensorFlow
Other
77.16k stars 45.76k forks source link

How to reduce CPU usage? #11064

Open VlaTal1 opened 1 year ago

VlaTal1 commented 1 year ago

I use this code to load the model:

model = 'WizardLM/WizardCoder-15B-V1.0'

def load_model(model = model):
    tokenizer = AutoTokenizer.from_pretrained(model)
    model = AutoModelForCausalLM.from_pretrained(model, device_map=device_map, load_in_8bit = True)
    return tokenizer, model

tokenizer, model = load_model(model)

And this code to generate:

          generation_config = GenerationConfig(
              temperature=0.0,
              top_p=0.95,
              top_k=50,
              eos_token_id=tokenizer.eos_token_id,
              pad_token_id=tokenizer.pad_token_id,
          )

      prompt_template = f'''
      Below is an instruction that describes a task. Write a response that appropriately completes the request

      ### Instruction: {prompt}

      ### Response:'''

      inputs = tokenizer(prompt_template, return_tensors="pt").to("cuda")
      generated_ids = model.generate(**inputs, generation_config=generation_config, max_new_tokens=6000)
      outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

This model fit at all into my GPU, but for some reason the GPU not even used (it is not heating while generating), but proccesor usage is 100% What`s wrong with my code or problem is in the model?

eashatirrazia commented 1 year ago

I am facing the same issue with faster-R-CNN model; while system monitor shows 100% GPU usage of only one of the eight cores, nvidia-smi shows only 0%(one figure) usage. Also, the process is killed after shuffle_buffer is filled due to all 24GB memory being used.