Fix Quantized Mdl Implementation, add probabilistic decoding, streamline generate scripts - Githubissues

rbroc / echo

A Scalable and Explainable Approach to Discriminating Between Human and Artificially Generated Text

https://cc.au.dk/en/clai/current-projects/a-scalable-and-explainable-approach-to-discriminating-between-human-and-artificially-generated-text

2 stars 1 forks source link

Fix Quantized Mdl Implementation, add probabilistic decoding, streamline generate scripts #37

Closed MinaAlmasi closed 8 months ago

MinaAlmasi commented 8 months ago

(Major) Changes

Fixed some bugs in quantized model implementation
Added option to do probabilistic decoding (currently only temperature = 1 is set, but now params can easily be added)

Minor Changes

Streamlined generate/ (rm pipeline.py, integrate its functions into run_pipeline.py)

Things to consider:

Quantized models are still heavy to run inference with? Further testing to see how hardware, batchsize and bit size etc. affects this is needed. For bit size, currently running 4bit (main), but there are other compressed versions by TheBloke. Should also look into other alternatives e.g., using Mixtral-8x-7B or using vllm (both suggested by Kenneth)