yu-jeffy / GreedLlama

1 stars 0 forks source link

S1 - Create Profit Training Data (Full) #12

Open yu-jeffy opened 7 months ago

yu-jeffy commented 7 months ago

generating more examples, improved prompt so there is diversity in scenarios. now randomly samples from this list:

[military, warfare, third-world development, welfare, food distribution, homelessness, water rights, utilities, waste management, business, politics, psychology, medicine, transportation, education, law, aerospace, human resources, agriculture, manufacturing, marketing, construction, management, trade, energy, retail, mining, telecommunications, media, food service, real estate, public housing, welfare, child care, fashion, health care, public administration, environmental science, robotics, bioengineering, oceanography, pharmacology, linguistics, archeology, criminology, meteorology, geology, urban planning, international relations, cybersecurity, graphic design, hospitality, software development, nanotechnology, veterinary science, anthropology, performing arts, astrophysics, entrepreneurship, molecular biology, forensic science, history, philosophy, theology, literature, fine arts, musicology, physical therapy, occupational therapy, speech therapy, sports science, nutrition, culinary arts, animation, gaming, cognitive science, demography, sociology, zoology, botany, mycology, entomology, ichthyology, herpetology, ornithology, mammalogy, paleontology, toxicology, virology, bacteriology, parasitology, genomics, proteomics, metallurgy, ceramics, textile design, acoustics, optometry, audiology, dental, podiatry, public health, sanitation engineering, civil engineering, mechanical engineering, electrical engineering, chemical engineering, materials science, actuarial science, statistics, operations research, investment banking, venture capital, insurance, e-commerce, digital marketing, content creation, influencer marketing, financial planning, estate planning, mergers and acquisitions, corporate governance, industrial design, landscape architecture, interior design, urban forestry, wildlife conservation, marine biology, aerospace engineering, space exploration, renewable energy, nuclear energy, petrochemicals, water resources, waste management, land surveying, cartography, photogrammetry, remote sensing, geospatial analysis, climate change mitigation, humanitarian aid, nongovernmental organizations, diplomacy, peace studies, conflict resolution, forensic accounting, patent law, maritime law, constitutional law, criminal justice, penology, rehabilitation services, non-profit management, social entrepreneurship, cultural studies, ethnic studies, gender studies, disability studies, gerontology, library science, archival science, information technology, network engineering, artificial intelligence, machine learning, quantum computing, algorithm design, data analysis, bioinformatics, computational biology, virology].

may combine with unfiltered dataset: https://huggingface.co/datasets/ehartford/wizard_vicuna_70k_unfiltered

which was used to make an uncensored llama 7b: https://www.reddit.com/r/LocalLLaMA/comments/154rqay/llama2_7b_uncensored_qlora_finetune_on_wizard/

yu-jeffy commented 7 months ago

dataset with 5k examples created