Closed jisungyoon closed 4 years ago
Yeah, this seems pretty strange. I took a look at the code and I can't immediately tell whats going on.
I'm on vacation now but will be back after the 7th and will find and fix the issue.
One small benefit here is that 0-transitions are not included in the gravity calculations, so org 1292 will not be included in those calculations. So while these organizations are missing, it is unlikely that adding them will change our results too much.
But I am worried about why this is happening, it seems like its only affecting some organizations.
One small benefit here is that 0-transitions are not included in the gravity calculations, so org 1292 will not be included in those calculations. So while these organizations are missing, it is unlikely that adding them will change our results too much.
But I am worried about why this is happening, it seems like its only affecting some organizations.
Then, I will reproduce the organization flow on my local computer and will figure out what's happening there. This file is also important in the mathematical proof of the gravity model, actually, I found this issue while I took a look for proof. I will take a look tomorrow.
One small benefit here is that 0-transitions are not included in the gravity calculations, so org 1292 will not be included in those calculations. So while these organizations are missing, it is unlikely that adding them will change our results too much. But I am worried about why this is happening, it seems like its only affecting some organizations.
Then, I will reproduce the organization flow on my local computer and will figure out what's happening there. This file is also important in the mathematical proof of the gravity model, actually, I found this issue while I took a look for proof. I will take a look tomorrow.
Thanks for looking into it, and for putting in so much work throughout the break!
The workflow/scripts/calculate_org_flows.py
script is creating that file. I struggled writing this script so I would greatly appreciate you going through it.
I've uploaded my "org_flows", and it makes more sense. Maybe, there is some problem in taking values from the upper triangle of the matrix. I will make the PR. please review it.
Thanks a lot! I will re-run the entire workflow first thing when I'm back in town.
This problem looks resolved. Closed
I think an organization flow file has a problem.
The fie says 1292 has not enough flow .
But, According to data set 1292 has a lot of co-occurrence pairs, {'3555': 1870, '3543': 1426, '3962': 944, '100189': 831, '1187': 408, '10203': 398, '100153': 244, '100183': 244, '100184': 216, '1166': 211, '10381': 209, '1296': 209, '1289': 195, '1176': 191, '1294': 188, '100870': 187, '100116': 186, '30010': 161, '1288': 160, '100104': 157, '1241': 153, '10136': 152, '1173': 149, '1188': 141, '15873': 138, '15892': 134, '9868': 134, '1276': 129, '10200': 124, '1247': 122, '1237': 122, '1208': 118, '10367': 117, '1272': 114, '1205': 114, '10397': 113, '1280': 108, '1226': 106, '1251': 105, '1193': 101, '1192': 101, '3315': 100, '1286': 99, '1189': 99, '1255': 97, '1249': 97, '15923': 90, '10372': 89, '1275': 89, '1084': 84, '1206': 82, '1184': 80, '100106': 80, '1277': 79, '10323': 79, '2056': 77, '9876': 77, '1203': 75, '10377': 75, '1168': 75, '15918': 75, '3493': 72, '15850': 72, '1083': 70, '1269': 69, '1238': 69, '1197': 69, '100144': 68, '1303': 68, '30008': 67, '1307': 66, '1172': 66, '1260': 66, '18593': 66, '1290': 64, '1305': 64, '1202': 63, '1180': 62, '2048': 62, '3961': 61, '100027': 61, '1214': 61, '3865': 61, '1231': 56, '15872': 56, '1127': 56, '3048': 54, '1225': 53, '1308': 52, '2047': 52, '100156': 51, '1997': 51, '339': 49, '30000': 49, '1218': 49, '1076': 49, '100117': 48, '1236': 46, '10410': 46, '9838': 45, '3316': 44, '2071': 44, '1285': 44, '1222': 44, '9816': 44, '1219': 43, '1278': 43, '30001': 43, '1224': 42, '282': 42, '100302': 41, '1163': 41, '1199': 41, '1179': 40, '1124': 40, '10378': 40, '1230': 40, '3047': 40, '1169': 40, '9826': 39, '1223': 39, '1291': 39, '1174': 39, '1209': 38, '1191': 38, '1216': 37, '10366': 37, '1213': 37, '10424': 37, '1194': 37, '1058': 36, '703': 35, '1055': 35, '1228': 35, '1114': 35, '1198': 35, '100145': 34, '1281': 33, '3544': 33, '15852': 32, '1195': 32, '18392': 32, '1553': 32, '1232': 32, '9303': 31, '1068': 31, '1250': 31, '30009': 31, '1258': 30, '2420': 30, '1177': 30, '2050': 29, '1262': 29, '2334': 29, '1274': 29, '1178': 29, '1987': 29, '2340': 28, '1298': 28, '100185': 28, '10435': 28, '1311': 27, '1408': 27, '1245': 27, '30002': 26, '3477': 26, '1183': 26, '1085': 25, '15849': 25, '30004': 25, '1138': 24, '1125': 24, '15941': 24, '1233': 24, '3490': 24, '1207': 23, '3541': 23, '3489': 23, '100385': 23, '1167': 23, '1128': 23, '1309': 22, '1411': 22, '2279': 22, '1299': 22, '100119': 21, '109': 21, '10190': 21, '1185': 21, '1165': 21, '325': 21, '690': 20, '1246': 20, '1270': 20, '2061': 20, '2280': 20, '1145': 20, '1070': 20, '704': 20, '10395': 20, '1057': 20, '3112': 20, '3507': 20, '30003': 20, '1164': 20, '1409': 20, ... } And all node is in result embedding. Can you check the organization flow file? I think it is very important. I used file in Dropbox/SME-dropbox/data/Derived/Descriptive/org