responsible-ai-collaborative / aiid

The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.
https://incidentdatabase.ai
Other
170 stars 35 forks source link

Write a migration to pull in all the data associated with the AI Litigation Database #2281

Open cesarvarela opened 1 year ago

cesarvarela commented 1 year ago

We need to build a denormalized list of Classifications from the CSV files provided because the current taxonomy design doesn't support linking between classifications.

file action
Area_of_Application_Table_2023-Aug-07_1439 contents already denormalized and present in main Case_Table CSV file.
Case_Application_Join_Table_2023-Aug-07_1439 Already denormalized
Case_Cause_Join_Table_2023-Aug-07_1439 Already denormalized
Case_Table_2023-Aug-07_1439 This is the main table
Cause_Table_2023-Aug-07_1439 Already denormalized
Docket_Table_2023-Aug-07_1439 Needs to be denormalized and linked to each respective Case_Table row
Document_Table_2023-Aug-07_1439 Needs to be denormalized
Issue_Table_2023-Aug-07_1439 Needs to be denormalized
Secondary_Source_Coverage_Table_2023-Aug-07_1439 Needs to be normalized

Taxonomy fields extracted from the Case_Table_2023-Aug-07_1439.csv

field example content taxa display_type
Record_Number "90" int
Caption "Kadrey v. Meta Platforms, Inc." text
Brief_Description "Authors Richard Kadrey, Sarah Silverman, and Christopher Golden sue Meta Platforms...", text
Area_of_Application_List "'Generative AI'" list
Area_of_Application_Text   ignored
Cause_of_Action_List "'Copyright Infringement','Unjust Enrichment','Unfair Competition'", multi
Cause_of_Action_Text   ignored
Issue_List "'Copyright Infringement'", multi
Issue_Text   ignored
Name_of_Algorithm_List "'LLaMA'" multi
Name_of_Algorithm_Text: "LLaMA",   ignored
Class_Action_list: "'Yes'",    
Class_Action: "",    
Organizations_involved "Meta Platforms, Inc.", multi
Jurisdiction_Filed "Federal: US Dist. Ct. N.D. Ca." string
Date_Action_Filed "07/07/2023" date
Published_Opinions: "",    
Published_Opinions_binary: "0", ignored  
Status_Disposition "Active"  
Date_Added "7/27/2023" date
Last_Update "7/27/2023" date
Progress_Notes "" text
Researcher, "Bob" string
Summary_of_Significance "" text
Summary_Facts_Activity_to_Date "Plaintiffs allege that Meta Platforms trained its large..", text
Most_Recent_Activity "Case reassigned to Judge Vince Chhabria" text
Most_Recent_Activity_Date "7/24/2023" date
Keyword "Kadrey v. Meta Platforms, Inc. Authors Richard Kadrey, Sarah Silverman, and Christopher...", text

Nested fields

Taxonomy fields extracted from the Document_Table_2023-Aug-07_1439.csv

field example content display_type
id "38"  
Case_Number "9" ignored, it is a foreign key
court "Complaint" string
date "12/16/2019" date
link https://www.courtlistener.com/recap/gov.uscourts.casd.660353/gov.uscourts.casd.660353.1.0.pdf string
cite_or_reference "" string
document "Complaint" string

Taxonomy fields extracted from the Docket_Table_2023-Aug-07_1439.csv

field example content display_type
Case_Number "9" ignored, foreign key
id "14" ignored, id field
court "Federal: District Court, S.D. California" string
number "3:19-cv-02407" string
link "https://www.courtlistener.com/docket/16596963/parsa-v-google-llc/" string

Taxonomy fields extracted from the Secondary_Source_Coverage_Table_2023-Aug-07_1439.csv

field example content display_type
id "33" ignored, id field
Case_Number "22" ignored, foreign key
Secondary_Source_Link "https://nationalfairhousing.org/2019/03/18/national-fair-housing-alliance-settles-lawsuit-with-facebook-transforms-facebooks-ad-platform-impacting-millions-of-users/", string
Secondary_Source_Title "" string

Tanomy definition

Does not include nested field, dockets, documents and secondary sources

{
    namespace: 'AILD',

    field_list: [
        {
            short_name: 'Record Number',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'int',
            mongo_type: 'int',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Caption',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'text',
            mongo_type: 'string',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Brief Description',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'text',
            mongo_type: 'string',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Area of Application List',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'list',
            mongo_type: 'array',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Cause of Action List',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'list',
            mongo_type: 'array',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Issue List',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'list',
            mongo_type: 'array',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Name of Algorithm List',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'list',
            mongo_type: 'array',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Class Action List',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'list',
            mongo_type: 'array',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Organizations involved',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'list',
            mongo_type: 'array',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Jurisdiction Filed',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'string',
            mongo_type: 'string',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Date Action Filed',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'date',
            mongo_type: 'date',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Published Opinions',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'bool',
            mongo_type: 'bool',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Status Disposition',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'multi',
            mongo_type: 'array',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Date Added',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'date',
            mongo_type: 'date',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Last Update',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'date',
            mongo_type: 'date',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Progress Notes',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'text',
            mongo_type: 'string',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Researcher',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'text',
            mongo_type: 'string',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Summary of Significance',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'text',
            mongo_type: 'string',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Summary Facts Activity to Date',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'text',
            mongo_type: 'string',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Most Recent Activity',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'text',
            mongo_type: 'string',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Most Recent Activity Date',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'date',
            mongo_type: 'date',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
        {
            short_name: 'Keyword',
            long_name: '',
            short_description: '',
            long_description: '',
            display_type: 'text',
            mongo_type: 'string',
            complete_from: {},
            default: '',
            placeholder: '',
            permitted_values: [],
            weight: 50,
            instant_facet: false,
            required: false,
            public: true,
        },
    ]
}

Mapping to reports

Classifications must be linked to either incidents or reports. In this case, we are using the reports collection, so we have to extract a report from the Case table:

report field Case attribute value used
report_number   latestReport.report_number + 1
title Caption  
description Brief_Description  
text Keyword  
plain_text Keyword  
is_incident_report: false,   false
authors   []
url   ""
source_domain   ""
cloudinary_id   ""
date_downloaded   format(currentDate, 'yyyy-MM-dd')
date_modified   format(currentDate, 'yyyy-MM-dd')
date_published   format(currentDate, 'yyyy-MM-dd')
date_submitted   format(currentDate, 'yyyy-MM-dd')
epoch_date_downloaded   getUnixTime(currrentDate)
epoch_date_modified   getUnixTime(currrentDate)
epoch_date_published   getUnixTime(currrentDate)
epoch_date_submitted   getUnixTime(currrentDate)
image_url   ""
language   "en"
submitters   []
smcgregor commented 1 year ago

Big issue title!