joaoftrodrigues / atpdb-cleanup-and-conversion-to-relational

0 stars 0 forks source link

Decompose locations #5

Closed joaoftrodrigues closed 1 year ago

joaoftrodrigues commented 1 year ago

Description

Both fields "Location" and "Born", establish a location, having city, country, and sometimes the state. The values are separated by comma, on a single string.

Example

An image, representing both cases, "City, Country" and "City, State, Country", respectively. image

Steps

Location

Bellow, there's a comment with commands + explanation, and next to it, only the commands for use.

joaoftrodrigues commented 1 year ago

City

Match regular expression in location and store it on db db.atpplayers.aggregate([ { $addFields: {City: { $regexFind: { input: "$Location", regex: "[^,]*" }}}}, {$out: {db: "atp", coll: "atpplayers"}}])

Change cities' values to matched words db.atpplayers.updateMany({}, [{$set: {"City": "$City.match"}}])

State

Match regular expression in location and store it on db db.atpplayers.aggregate([ { $addFields: {State: { $regexFind: { input: "$Location", regex: ",[ ]*([^,]*)," }}}},{$out: {db: "atp", coll: "atpplayers"}}])

Change states' values to matched words In this case, it has to be taken from captures as it's being used commas (,) on start and ending of regular expression, but only capturing the word between both. db.atpplayers.updateMany({}, [{$set: {State: {$first: "$State.captures"}}}])

Country

Match regular expression in location and store it on db db.atpplayers.aggregate([{ $addFields: {Country: { $regexFind: { input: "$Location", regex: "([ ]*[a-zA-Z ]*,[ ]?[a-zA-Z ]*,[ ]*([ a-zA-Z.]*))|([\w ]*,[ ]*([ a-zA-Z.]*))" }}}}, {$out: {db: "atp", coll: "atpplayers"}}])

Remove nulls from array This must be done, as fields with state will have 2 nulls on different indexes, from fields without state. This way, the index will be the same for both cases.

db.atpplayers.updateMany({}, [{$set: {Country: "$Country.captures"}}])

db.atpplayers.updateMany({}, {$pull: {Country: null}})

Change countries' values to matched words Slice makes possible to access an element based on its index, but it returns an array, so $first must be used to get only the value. db.atpplayers.updateMany({}, [{$set: {Country: {$first: {$slice: ["$Country",-1,1]}}}}])