[Ongoing] Knowledge base additions

jphall663 commented 8 months ago

~executive order on AI~
~NIST 800-30 rev1 https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-30r1.pdf~
~IEEE 1012 (199X or 2016) https://people.eecs.ku.edu/~hossein/Teaching/Stds/1012.pdf~
~https://www.uspto.gov/sites/default/files/documents/USPTO_AI-Report_2020-10-07.pdf~
~https://www.commerce.gov/issues/intellectual-property (see if you think it misses the mark, it might b/c I don't see an AI focus)~
~https://standards.ieee.org/ieee/3119/10729/~

jphall663 commented 8 months ago

~https://www.frontiermodelforum.org/uploads/2023/10/FMF-AI-Red-Teaming.pdf~

~https://github.com/openai/openai-cookbook/tree/main~

jphall663 commented 8 months ago

~https://resources.oreilly.com/examples/0636920415947/-/blob/master/Attack_Cheat_Sheet.png <- community resources~

datherton09 commented 8 months ago

All added. Waiting on EO. Decided to go ahead and add the "Intellectual property" page because I could still imagine it being a useful resource/portal (especially considering the USTPO falls under it, and that contains a specific resource we link to).

jphall663 commented 8 months ago

~https://www.imda.gov.sg/resources/press-releases-factsheets-and-speeches/press-releases/2023/generative-ai-evaluation-sandbox <- GAI resources~

datherton09 commented 4 months ago

[ALL ADDED, 2/21/2024]

benchmarks:

https://wavesbench.github.io/ https://github.com/huggingface/evaluate https://github.com/AI-secure/DecodingTrust https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vQObeTxvXtOs--zd98qG2xBHHuTTJOyNISBJPthZFr3at2LCrs3rcv73d4of1A78JV2eLuxECFXJY43/pubhtml https://safetyprompts.com/ python software:

https://github.com/lilacai/lilac official guidance:

https://www.ohchr.org/sites/default/files/documents/issues/business/b-tech/taxonomy-GenAI-Human-Rights-Harms.pdf community resources:

https://www.hackerone.com/vulnerability-and-security-testing-blog https://www.synack.com/wp-content/uploads/2022/09/Crowdsourced-Security-Landscape-Government.pdf CSET stuff (just double check we reference somehow): -- https://cset.georgetown.edu/article/translating-ai-risk-management-into-practice/ -- https://cset.georgetown.edu/publication/repurposing-the-wheel/ -- https://cset.georgetown.edu/publication/adding-structure-to-ai-harm/ -- https://cset.georgetown.edu/article/understanding-ai-harms-an-overview/ -- https://cset.georgetown.edu/publication/ai-incident-collection-an-observational-study-of-the-great-ai-experiment/ https://www.scsp.ai/wp-content/uploads/2023/11/SCSP_JHU-HCAI-Framework-Nov-6.pdf https://openai.com/research/building-an-early-warning-system-for-llm-aided-biological-threat-creation https://c2pa.org/ https://aiverifyfoundation.sg/downloads/Cataloguing_LLM_Evaluations.pdf https://partnershiponai.org/modeldeployment/ https://cdn.openai.com/openai-preparedness-framework-beta.pdf

https://dominiquesheltonleipzig.com/country-legislation-frameworks/

red-teaming section:

https://www.hackerone.com/thought-leadership/ai-safety-red-teaming https://cset.georgetown.edu/article/what-does-ai-red-teaming-actually-mean/

jphall663 commented 3 months ago

Red teaming -- but do we want to start hosting papers?

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal (2024) Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendryckshttps://arxiv.org/pdf/2402.04249.pdf

Red-Teaming for Generative AI: Silver Bullet or Security Theater? Michael Feffer, Anusha Sinha, Zachary C. Lipton, Hoda Heidarihttps://arxiv.org/pdf/2401.15897.pdf

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models Chengdong Ma, Ziran Yang, Minquan Gao, Hai Ci, Jun Gao, Xuehai Pan, Yaodong Yanghttps://arxiv.org/pdf/2310.00322.pdf

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment (2023)https://arxiv.org/pdf/2308.09662.pdf

Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases Rishabh Bhardwaj, Soujanya Poriahttps://arxiv.org/pdf/2310.14303.pdf

jphall663 commented 3 months ago

GAI Critiques:

reasoning gap: https://arxiv.org/pdf/2402.19450.pdf
stealing language models: https://arxiv.org/pdf/2403.06634.pdf
dialect prejudice: https://arxiv.org/pdf/2403.00742.pdf

jphall663 / awesome-machine-learning-interpretability

[Ongoing] Knowledge base additions #142