joaoftrodrigues / atpdb-cleanup-and-conversion-to-relational

0 stars 0 forks source link

Decompose Born #9

Closed joaoftrodrigues closed 1 year ago

joaoftrodrigues commented 1 year ago

Description

Born can be composed just like Location: City + Country or City + State + Country (more info in #5 ).

Action Plan

Idea is to decompose Born in city, country and state, if exists, based on queries used on Location #5 .

joaoftrodrigues commented 1 year ago

City

Match regular expression in Born and store it on db db.atpplayers.aggregate([ { $addFields: {BornCity: { $regexFind: { input: "$Born", regex: "[^,]*" }}}}, {$out: {db: "atp", coll: "atpplayers"}}])

Change cities' values to matched words db.atpplayers.updateMany({}, [{$set: {"BornCity": "$BornCity.match"}}])

State

Match regular expression in Born and store it on db db.atpplayers.aggregate([ { $addFields: {BornState: { $regexFind: { input: "$Born", regex: ",[ ]*([^,]*)," }}}},{$out: {db: "atp", coll: "atpplayers"}}])

Change states' values to matched words In this case, it has to be taken from captures as it's being used commas (,) on start and ending of regular expression, but only capturing the word between both. db.atpplayers.updateMany({}, [{$set: {BornState: {$first: "$BornState.captures"}}}])

Country

Match regular expression in Born and store it on db db.atpplayers.aggregate([{ $addFields: {BornCountry: { $regexFind: { input: "$Born", regex: "([ ]*[a-zA-Z ]*,[ ]?[a-zA-Z ]*,[ ]*([ a-zA-Z.]*))|([\w ]*,[ ]*([ a-zA-Z.]*))" }}}},{$out: {db: "atp", coll: "atpplayers"}}])

Remove nulls from array This must be done, as fields with state will have 2 nulls on different indexes, from fields without state. This way, the index will be the same for both cases.

db.atpplayers.updateMany({}, [{$set: {BornCountry: "$BornCountry.captures"}}])

db.atpplayers.updateMany({}, {$pull: {BornCountry: null}})

Change countries' values to matched words Slice makes possible to access an element based on its index, but it returns an array, so $first must be used to get only the value. db.atpplayers.updateMany({}, [{$set: {BornCountry: {$first: {$slice: ["$BornCountry",-1,1]}}}}])