hogent-cads / AI_MDM_Prototype

GNU Affero General Public License v3.0
0 stars 1 forks source link

Dedupicatie: gegeneerd script bevat niet de juiste behandeling voor de kolommen. #32

Closed slievens closed 1 year ago

slievens commented 1 year ago

afbeelding

We kiezen "don't use" voor de ID kolom.

In het gegeneerd script staat dan echter

from zingg.client import *
from zingg.pipes import *

#build the arguments for zingg
args = Arguments()
#set field definitions:
id = FieldDefinition("id", "string", MatchType.FUZZY)
artist = FieldDefinition("artist", "string", MatchType.FUZZY)
title = FieldDefinition("title", "string", MatchType.FUZZY)
category = FieldDefinition("category", "string", MatchType.FUZZY)
genre = FieldDefinition("genre", "string", MatchType.FUZZY)
year = FieldDefinition("year", "string", MatchType.FUZZY)
track01 = FieldDefinition("track01", "string", MatchType.FUZZY)
track02 = FieldDefinition("track02", "string", MatchType.FUZZY)
track03 = FieldDefinition("track03", "string", MatchType.FUZZY)
track04 = FieldDefinition("track04", "string", MatchType.FUZZY)

fieldDefs = [id, artist, title, category, genre, year, track01, track02, track03, track04]
args.setFieldDefinition(fieldDefs)
args.setModelId("f193d6e7-7cca-4185-a298-6771cc857a1c-5af56602494a066b34f8f87846ac72dd")
args.setZinggDir("storage/f193d6e7-7cca-4185-a298-6771cc857a1c-5af56602494a066b34f8f87846ac72dd/models")
args.setNumPartitions(4)
args.setLabelDataSampleSize(0.5)

Bij het ID field staat er nog steeds "fuzzy" ?

slievens commented 1 year ago

Solved in commit https://github.com/hogent-cads/AI_MDM_Prototype/commit/e43057d51dc5e255d0ac8f530b3b5b2724830039