jphall663 / awesome-machine-learning-interpretability

A curated list of awesome responsible machine learning resources.
Creative Commons Zero v1.0 Universal
3.51k stars 579 forks source link

[Ongoing] Knowledge base additions #142

Open jphall663 opened 8 months ago

jphall663 commented 8 months ago
jphall663 commented 8 months ago

~https://www.frontiermodelforum.org/uploads/2023/10/FMF-AI-Red-Teaming.pdf~

~https://github.com/openai/openai-cookbook/tree/main~

jphall663 commented 8 months ago

~https://resources.oreilly.com/examples/0636920415947/-/blob/master/Attack_Cheat_Sheet.png <- community resources~

datherton09 commented 8 months ago

All added. Waiting on EO. Decided to go ahead and add the "Intellectual property" page because I could still imagine it being a useful resource/portal (especially considering the USTPO falls under it, and that contains a specific resource we link to).

jphall663 commented 8 months ago

~https://www.imda.gov.sg/resources/press-releases-factsheets-and-speeches/press-releases/2023/generative-ai-evaluation-sandbox <- GAI resources~

datherton09 commented 4 months ago

[ALL ADDED, 2/21/2024]

benchmarks:

https://wavesbench.github.io/ https://github.com/huggingface/evaluate https://github.com/AI-secure/DecodingTrust https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vQObeTxvXtOs--zd98qG2xBHHuTTJOyNISBJPthZFr3at2LCrs3rcv73d4of1A78JV2eLuxECFXJY43/pubhtml https://safetyprompts.com/ python software:

https://github.com/lilacai/lilac official guidance:

https://www.ohchr.org/sites/default/files/documents/issues/business/b-tech/taxonomy-GenAI-Human-Rights-Harms.pdf community resources:

https://www.hackerone.com/vulnerability-and-security-testing-blog https://www.synack.com/wp-content/uploads/2022/09/Crowdsourced-Security-Landscape-Government.pdf CSET stuff (just double check we reference somehow): -- https://cset.georgetown.edu/article/translating-ai-risk-management-into-practice/ -- https://cset.georgetown.edu/publication/repurposing-the-wheel/ -- https://cset.georgetown.edu/publication/adding-structure-to-ai-harm/ -- https://cset.georgetown.edu/article/understanding-ai-harms-an-overview/ -- https://cset.georgetown.edu/publication/ai-incident-collection-an-observational-study-of-the-great-ai-experiment/ https://www.scsp.ai/wp-content/uploads/2023/11/SCSP_JHU-HCAI-Framework-Nov-6.pdf https://openai.com/research/building-an-early-warning-system-for-llm-aided-biological-threat-creation https://c2pa.org/ https://aiverifyfoundation.sg/downloads/Cataloguing_LLM_Evaluations.pdf https://partnershiponai.org/modeldeployment/ https://cdn.openai.com/openai-preparedness-framework-beta.pdf

https://dominiquesheltonleipzig.com/country-legislation-frameworks/

red-teaming section:

https://www.hackerone.com/thought-leadership/ai-safety-red-teaming https://cset.georgetown.edu/article/what-does-ai-red-teaming-actually-mean/

jphall663 commented 3 months ago

Red teaming -- but do we want to start hosting papers?

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal (2024) Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendryckshttps://arxiv.org/pdf/2402.04249.pdf

Red-Teaming for Generative AI: Silver Bullet or Security Theater? Michael Feffer, Anusha Sinha, Zachary C. Lipton, Hoda Heidarihttps://arxiv.org/pdf/2401.15897.pdf

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models Chengdong Ma, Ziran Yang, Minquan Gao, Hai Ci, Jun Gao, Xuehai Pan, Yaodong Yanghttps://arxiv.org/pdf/2310.00322.pdf

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment (2023)https://arxiv.org/pdf/2308.09662.pdf

Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases Rishabh Bhardwaj, Soujanya Poriahttps://arxiv.org/pdf/2310.14303.pdf

jphall663 commented 3 months ago

GAI Critiques: