leondz / garak

LLM vulnerability scanner
https://discord.gg/uVch4puUCs
Apache License 2.0
1.32k stars 153 forks source link

huggingface models with no predefined maximum length. Default to no truncation. #551

Closed Trailingslashes closed 4 weeks ago

Trailingslashes commented 6 months ago

Python3.12 garak --version garak LLM security probe v0.9.0.12 ( https://github.com/leondz/garak ) at 2024-03-13T16:54:09.098209

Running any model from garak results in an error:

 garak --model_type huggingface --model_name "mistralai/Mistral-7B-v0.1" --probes all --detectors all --parallel_requests 5 --parallel_attempts 5
garak LLM security probe v0.9.0.12 ( https://github.com/leondz/garak ) at 2024-03-13T16:42:27.422029
📜 reporting to garak_runs/garak.344f2112-2d4e-4957-8c4c-7f9956f08ead.report.jsonl
🦜 loading generator: Hugging Face 🤗 pipeline: mistralai/Mistral-7B-v0.1
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 571/571 [00:00<00:00, 598kB/s]
model.safetensors.index.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25.1k/25.1k [00:00<00:00, 39.1MB/s]
model-00001-of-00002.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.94G/9.94G [06:03<00:00, 27.3MB/s]
model-00002-of-00002.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.54G/4.54G [02:39<00:00, 28.4MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [08:44<00:00, 262.17s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:37<00:00, 18.97s/it]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 728kB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 967/967 [00:00<00:00, 846kB/s]
tokenizer.model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 31.9MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.80M/1.80M [00:00<00:00, 5.97MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 72.0/72.0 [00:00<00:00, 342kB/s]
🕵️  queue of probes: atkgen.Tox, continuation.ContinueSlursReclaimedSlurs80, dan.AntiDAN, dan.AutoDANProbe, dan.ChatGPT_Developer_Mode_RANTI, dan.ChatGPT_Developer_Mode_v2, dan.ChatGPT_Image_Markdown, dan.DAN_Jailbreak, dan.DUDE, dan.Dan_10_0, dan.Dan_11_0, dan.Dan_6_0, dan.Dan_6_2, dan.Dan_7_0, dan.Dan_8_0, dan.Dan_9_0, dan.STAN, encoding.InjectAscii85, encoding.InjectBase16, encoding.InjectBase2048, encoding.InjectBase32, encoding.InjectBase64, encoding.InjectBraille, encoding.InjectEcoji, encoding.InjectHex, encoding.InjectMorse, encoding.InjectNato, encoding.InjectROT13, encoding.InjectUU, encoding.InjectZalgo, gcg.GCG_Probe, glitch.Glitch100, goodside.Davidjl, goodside.Tag, goodside.ThreatenJSON, goodside.WhoIsRiley, knownbadsignatures.EICAR, knownbadsignatures.GTUBE, knownbadsignatures.GTphish, leakreplay.GuardianCloze, leakreplay.GuardianComplete, leakreplay.LiteratureCloze80, leakreplay.LiteratureComplete80, leakreplay.NYTCloze, leakreplay.NYTComplete, lmrc.Anthropomorphisation, lmrc.Bullying, lmrc.Deadnaming, lmrc.Profanity, lmrc.QuackMedicine, lmrc.SexualContent, lmrc.Sexualisation, lmrc.SlurUsage, malwaregen.Evasion, malwaregen.Payload, malwaregen.SubFunctions, malwaregen.TopLevel, misleading.FalseAssertion50, packagehallucination.Python, promptinject.HijackHateHumansMini, promptinject.HijackKillHumansMini, promptinject.HijackLongPromptMini, realtoxicityprompts.RTPBlank, realtoxicityprompts.RTPFlirtation, realtoxicityprompts.RTPIdentity_Attack, realtoxicityprompts.RTPInsult, realtoxicityprompts.RTPProfanity, realtoxicityprompts.RTPSevere_Toxicity, realtoxicityprompts.RTPSexually_Explicit, realtoxicityprompts.RTPThreat, replay.Repeat, snowball.GraphConnectivityMini, snowball.PrimesMini, snowball.SenatorsMini, tap.TAPCached, xss.MarkdownImageExfil
🔎 queue of detectors: always.Fail, always.Pass, continuation.Continuation, dan.AntiDAN, dan.DAN, dan.DANJailbreak, dan.DUDE, dan.DevMode, dan.MarkdownLink, dan.STAN, encoding.DecodeApprox, encoding.DecodeMatch, goodside.Glitch, goodside.PlainJSON, goodside.RileyIsnt, knownbadsignatures.EICAR, knownbadsignatures.GTUBE, knownbadsignatures.GTphish, leakreplay.StartsWith, lmrc.Anthro, lmrc.QuackMedicine, malwaregen.AnyCode, misleading.MustRefuteClaimKW, misleading.MustRefuteClaimModel, mitigation.MitigationBypass, packagehallucination.PythonPypi, promptinject.AttackRogueString, replay.RepeatDiverges, riskywords.LDNOOBW, riskywords.OfcomOffensiveDiscriminatory, riskywords.OfcomOffensiveGeneral, riskywords.OfcomOffensiveMentalHealth, riskywords.OfcomOffensiveRaceEthnic, riskywords.OfcomOffensiveSexGender, riskywords.OfcomOffensiveSexual, riskywords.SurgeProfanityAnimalReferences, riskywords.SurgeProfanityBodilyFluids, riskywords.SurgeProfanityMentalDisability, riskywords.SurgeProfanityPhysicalAttributes, riskywords.SurgeProfanityPhysicalDisability, riskywords.SurgeProfanityPolitical, riskywords.SurgeProfanityRacialEthnic, riskywords.SurgeProfanityReligious, riskywords.SurgeProfanitySexual, riskywords.SurgeProfanitySexualOrientationGender, snowball.DontStartNo, snowball.DontStartYes, specialwords.SlursReclaimedSlurs, toxicity.ToxicCommentModel, xss.MarkdownExfilBasic, xss.MarkdownExfilContent
🔴🪖  🦜 loading generator: Hugging Face 🤗 pipeline: leondz/artgpt2tox
probes.atkgen.Tox:   0%|                                                                                                                                                                                                                                            | 0/10 [00:00<?, ?it/sAsking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. 
Trailingslashes commented 6 months ago

sking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

leondz commented 6 months ago

Thanks, will take a look. There is a huge range of models on Hugging Face and some have quite different APIs from others.

Two things to try:

Trailingslashes commented 6 months ago

AttributeError: module 'garak.generators.huggingface' has no attribute 'model'. Did you mean: 'Model'? (garak) ✘ $ ~/garak  garak --model_type huggingface.Model --model_name "google/gemma-7b" --probes all --detectors all --parallel_requests 5 --parallel_attempts 5 garak LLM security probe v0.9.0.12 ( https://github.com/leondz/garak ) at 2024-03-14T09:29:07.085797 📜 reporting to garak_runs/garak.bfb1e889-0983-4a80-8fce-89d5646e2ce3.report.jsonl 🦜 loading generator: Hugging Face 🤗 model: google/gemma-7b Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:49<00:00, 12.46s/it] 🕵️ queue of probes: atkgen.Tox, continuation.ContinueSlursReclaimedSlurs80, dan.AntiDAN, dan.AutoDANProbe, dan.ChatGPT_Developer_Mode_RANTI, dan.ChatGPT_Developer_Mode_v2, dan.ChatGPT_Image_Markdown, dan.DAN_Jailbreak, dan.DUDE, dan.Dan_10_0, dan.Dan_11_0, dan.Dan_6_0, dan.Dan_6_2, dan.Dan_7_0, dan.Dan_8_0, dan.Dan_9_0, dan.STAN, encoding.InjectAscii85, encoding.InjectBase16, encoding.InjectBase2048, encoding.InjectBase32, encoding.InjectBase64, encoding.InjectBraille, encoding.InjectEcoji, encoding.InjectHex, encoding.InjectMorse, encoding.InjectNato, encoding.InjectROT13, encoding.InjectUU, encoding.InjectZalgo, gcg.GCG_Probe, glitch.Glitch100, goodside.Davidjl, goodside.Tag, goodside.ThreatenJSON, goodside.WhoIsRiley, knownbadsignatures.EICAR, knownbadsignatures.GTUBE, knownbadsignatures.GTphish, leakreplay.GuardianCloze, leakreplay.GuardianComplete, leakreplay.LiteratureCloze80, leakreplay.LiteratureComplete80, leakreplay.NYTCloze, leakreplay.NYTComplete, lmrc.Anthropomorphisation, lmrc.Bullying, lmrc.Deadnaming, lmrc.Profanity, lmrc.QuackMedicine, lmrc.SexualContent, lmrc.Sexualisation, lmrc.SlurUsage, malwaregen.Evasion, malwaregen.Payload, malwaregen.SubFunctions, malwaregen.TopLevel, misleading.FalseAssertion50, packagehallucination.Python, promptinject.HijackHateHumansMini, promptinject.HijackKillHumansMini, promptinject.HijackLongPromptMini, realtoxicityprompts.RTPBlank, realtoxicityprompts.RTPFlirtation, realtoxicityprompts.RTPIdentity_Attack, realtoxicityprompts.RTPInsult, realtoxicityprompts.RTPProfanity, realtoxicityprompts.RTPSevere_Toxicity, realtoxicityprompts.RTPSexually_Explicit, realtoxicityprompts.RTPThreat, replay.Repeat, snowball.GraphConnectivityMini, snowball.PrimesMini, snowball.SenatorsMini, tap.TAPCached, xss.MarkdownImageExfil 🔎 queue of detectors: always.Fail, always.Pass, continuation.Continuation, dan.AntiDAN, dan.DAN, dan.DANJailbreak, dan.DUDE, dan.DevMode, dan.MarkdownLink, dan.STAN, encoding.DecodeApprox, encoding.DecodeMatch, goodside.Glitch, goodside.PlainJSON, goodside.RileyIsnt, knownbadsignatures.EICAR, knownbadsignatures.GTUBE, knownbadsignatures.GTphish, leakreplay.StartsWith, lmrc.Anthro, lmrc.QuackMedicine, malwaregen.AnyCode, misleading.MustRefuteClaimKW, misleading.MustRefuteClaimModel, mitigation.MitigationBypass, packagehallucination.PythonPypi, promptinject.AttackRogueString, replay.RepeatDiverges, riskywords.LDNOOBW, riskywords.OfcomOffensiveDiscriminatory, riskywords.OfcomOffensiveGeneral, riskywords.OfcomOffensiveMentalHealth, riskywords.OfcomOffensiveRaceEthnic, riskywords.OfcomOffensiveSexGender, riskywords.OfcomOffensiveSexual, riskywords.SurgeProfanityAnimalReferences, riskywords.SurgeProfanityBodilyFluids, riskywords.SurgeProfanityMentalDisability, riskywords.SurgeProfanityPhysicalAttributes, riskywords.SurgeProfanityPhysicalDisability, riskywords.SurgeProfanityPolitical, riskywords.SurgeProfanityRacialEthnic, riskywords.SurgeProfanityReligious, riskywords.SurgeProfanitySexual, riskywords.SurgeProfanitySexualOrientationGender, snowball.DontStartNo, snowball.DontStartYes, specialwords.SlursReclaimedSlurs, toxicity.ToxicCommentModel, xss.MarkdownExfilBasic, xss.MarkdownExfilContent 🔴🪖 🦜 loading generator: Hugging Face 🤗 pipeline: leondz/artgpt2tox probes.atkgen.Tox: 0%| | 0/10 [00:00<?, ?it/sAsking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

@leondz, no luck, but i'll give NVCF a shot.