Build Dataset Script Fails

kyegomez / Andromeda

An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast

https://discord.gg/qUtxnK2NMf

GNU General Public License v3.0

136 stars 20 forks source link

Build Dataset Script Fails #2

Closed evannorstrand-mp closed 1 year ago

evannorstrand-mp commented 1 year ago

python3 Andromeda/build_dataset.py --seed 42 --seq_len 8192 --hf_account "" --tokenizer "EleutherAI/gpt-neox-20b" --dataset_name "EleutherAI/the_pile_deduplicated"

Traceback (most recent call last): File "/home/ubuntu/Andromeda/Andromeda/build_dataset.py", line 70, in built_dataset(args) File "/home/ubuntu/Andromeda/Andromeda/build_dataset.py", line 17, in built_dataset tokenizer = AutoTokenizer.from_pretrained(CFG.Tokenizer) AttributeError: type object 'CFG' has no attribute 'Tokenizer'

evannorstrand-mp commented 1 year ago

CFG.Tokenizer should be CFG.TOKENIZER