murrayds / sci-mobility-emb

Embedding of scientific mobility across institutions, cities, regions, and countries
4 stars 0 forks source link

Organization flow issue #45

Closed jisungyoon closed 4 years ago

jisungyoon commented 4 years ago

I think an organization flow file has a problem.

The fie says 1292 has not enough flow .

Screen Shot 2020-01-02 at 6 05 31 PM

But, According to data set 1292 has a lot of co-occurrence pairs, {'3555': 1870, '3543': 1426, '3962': 944, '100189': 831, '1187': 408, '10203': 398, '100153': 244, '100183': 244, '100184': 216, '1166': 211, '10381': 209, '1296': 209, '1289': 195, '1176': 191, '1294': 188, '100870': 187, '100116': 186, '30010': 161, '1288': 160, '100104': 157, '1241': 153, '10136': 152, '1173': 149, '1188': 141, '15873': 138, '15892': 134, '9868': 134, '1276': 129, '10200': 124, '1247': 122, '1237': 122, '1208': 118, '10367': 117, '1272': 114, '1205': 114, '10397': 113, '1280': 108, '1226': 106, '1251': 105, '1193': 101, '1192': 101, '3315': 100, '1286': 99, '1189': 99, '1255': 97, '1249': 97, '15923': 90, '10372': 89, '1275': 89, '1084': 84, '1206': 82, '1184': 80, '100106': 80, '1277': 79, '10323': 79, '2056': 77, '9876': 77, '1203': 75, '10377': 75, '1168': 75, '15918': 75, '3493': 72, '15850': 72, '1083': 70, '1269': 69, '1238': 69, '1197': 69, '100144': 68, '1303': 68, '30008': 67, '1307': 66, '1172': 66, '1260': 66, '18593': 66, '1290': 64, '1305': 64, '1202': 63, '1180': 62, '2048': 62, '3961': 61, '100027': 61, '1214': 61, '3865': 61, '1231': 56, '15872': 56, '1127': 56, '3048': 54, '1225': 53, '1308': 52, '2047': 52, '100156': 51, '1997': 51, '339': 49, '30000': 49, '1218': 49, '1076': 49, '100117': 48, '1236': 46, '10410': 46, '9838': 45, '3316': 44, '2071': 44, '1285': 44, '1222': 44, '9816': 44, '1219': 43, '1278': 43, '30001': 43, '1224': 42, '282': 42, '100302': 41, '1163': 41, '1199': 41, '1179': 40, '1124': 40, '10378': 40, '1230': 40, '3047': 40, '1169': 40, '9826': 39, '1223': 39, '1291': 39, '1174': 39, '1209': 38, '1191': 38, '1216': 37, '10366': 37, '1213': 37, '10424': 37, '1194': 37, '1058': 36, '703': 35, '1055': 35, '1228': 35, '1114': 35, '1198': 35, '100145': 34, '1281': 33, '3544': 33, '15852': 32, '1195': 32, '18392': 32, '1553': 32, '1232': 32, '9303': 31, '1068': 31, '1250': 31, '30009': 31, '1258': 30, '2420': 30, '1177': 30, '2050': 29, '1262': 29, '2334': 29, '1274': 29, '1178': 29, '1987': 29, '2340': 28, '1298': 28, '100185': 28, '10435': 28, '1311': 27, '1408': 27, '1245': 27, '30002': 26, '3477': 26, '1183': 26, '1085': 25, '15849': 25, '30004': 25, '1138': 24, '1125': 24, '15941': 24, '1233': 24, '3490': 24, '1207': 23, '3541': 23, '3489': 23, '100385': 23, '1167': 23, '1128': 23, '1309': 22, '1411': 22, '2279': 22, '1299': 22, '100119': 21, '109': 21, '10190': 21, '1185': 21, '1165': 21, '325': 21, '690': 20, '1246': 20, '1270': 20, '2061': 20, '2280': 20, '1145': 20, '1070': 20, '704': 20, '10395': 20, '1057': 20, '3112': 20, '3507': 20, '30003': 20, '1164': 20, '1409': 20, ... } And all node is in result embedding. Can you check the organization flow file? I think it is very important. I used file in Dropbox/SME-dropbox/data/Derived/Descriptive/org

murrayds commented 4 years ago

Yeah, this seems pretty strange. I took a look at the code and I can't immediately tell whats going on.

I'm on vacation now but will be back after the 7th and will find and fix the issue.

murrayds commented 4 years ago

One small benefit here is that 0-transitions are not included in the gravity calculations, so org 1292 will not be included in those calculations. So while these organizations are missing, it is unlikely that adding them will change our results too much.

But I am worried about why this is happening, it seems like its only affecting some organizations.

jisungyoon commented 4 years ago

One small benefit here is that 0-transitions are not included in the gravity calculations, so org 1292 will not be included in those calculations. So while these organizations are missing, it is unlikely that adding them will change our results too much.

But I am worried about why this is happening, it seems like its only affecting some organizations.

Then, I will reproduce the organization flow on my local computer and will figure out what's happening there. This file is also important in the mathematical proof of the gravity model, actually, I found this issue while I took a look for proof. I will take a look tomorrow.

murrayds commented 4 years ago

One small benefit here is that 0-transitions are not included in the gravity calculations, so org 1292 will not be included in those calculations. So while these organizations are missing, it is unlikely that adding them will change our results too much. But I am worried about why this is happening, it seems like its only affecting some organizations.

Then, I will reproduce the organization flow on my local computer and will figure out what's happening there. This file is also important in the mathematical proof of the gravity model, actually, I found this issue while I took a look for proof. I will take a look tomorrow.

Thanks for looking into it, and for putting in so much work throughout the break!

The workflow/scripts/calculate_org_flows.py script is creating that file. I struggled writing this script so I would greatly appreciate you going through it.

jisungyoon commented 4 years ago

I've uploaded my "org_flows", and it makes more sense. Maybe, there is some problem in taking values from the upper triangle of the matrix. I will make the PR. please review it.

jisungyoon commented 4 years ago

46

murrayds commented 4 years ago

Thanks a lot! I will re-run the entire workflow first thing when I'm back in town.

jisungyoon commented 4 years ago

This problem looks resolved. Closed