juamiji1 / ethnographic-atlas

0 stars 0 forks source link

Detect and fix outliers using the nearest neighbor regional code #1

Open juamiji1 opened 5 months ago

juamiji1 commented 5 months ago

To catch outliers I compared the area codes of variables v91 and v92 of each ethnic group to its nearest neighbor. As such, we got a list of potential outliers in terms of location:

OBJECTID v91 v92 v93 v107 NEAR_FID NEAR_DIST NEAR_v91 NEAR_v92 NEAR_v93 NEAR_v107 Outliers
2 C a 27 ANFILLO . 560 0 A i 46 KOMA. . .  
3 A i 44 ANUAK . . 42 1.414213562 C a 27 ANFILLO .  
4 C c 8 AULLIMIND 350 1 A f 1 FON . . .  
5 C b 27 BACHAMA . 186 0 A h 35 BURA. . .  
6 E a 8 BAKHTIARI 670 2.236067977 C j 10 MADAN . .  
7 A j 18 BODI. . . 362 1 C a 20 GALAB . .  
8 C f 2 BOERS . . 451 1 A c 1 ILA . . . 1
9 C b 8 BOROROFUL 286 1 A h 13 DAKAKARI.  
11 C f 4 BRAZILIAN 208 5 S j 1 CARAJA. . 1
12 A h 35 BURA. . . 85 0 C b 27 BACHAMA .  
13 A h 13 DAKAKARI. 169 1 C b 8 BOROROFUL  
15 C b 21 DJAFUN. . 794 1 A i 16 NAMSHI. .  
16 A f 1 FON . . . 74 1 C c 8 AULLIMIND  
17 C f 5 FRENCHCAN 768 1 N a 32 MONTAGNAI 1
18 C a 9 GOROA . . 723 0 A d 5 MBUGWE. .  
19 A a 9 HATSA . . 458 1 C a 4 IRAQW . .  
20 C b 14 HEMAT . . 976 2.828427125 A i 22 SARA. . .  
22 C b 19 KANURI. . 706 1 A h 5 MARGI . .  
24 A i 46 KOMA. . . 42 0 C a 27 ANFILLO .  
25 A i 18 KOTOKO. . 178 1 C b 5 BUDUMA. .  
26 C b 22 LIPTAKO . 151 2 A g 53 BISA. . .  
27 A h 30 LONGUDA . 85 0 C b 27 BACHAMA .  
28 A i 47 MAO . . . 42 0 C a 27 ANFILLO .  
29 A d 5 MBUGWE. . 380 0 C a 9 GOROA . .  
30 A i 14 MBUM. . . 305 1.414213562 C b 21 DJAFUN. .  
31 C b 15 MESSIRIA. 849 1.414213562 A i 43 NYIMA . .  
32 N a 45 MISTASSIN 522 4.472135955 E b 1 KAZAK . . 1
34 A h 31 MUMUYE. . 85 1 C b 27 BACHAMA .  
35 C f 1 NEWENGLAN 292 2.236067977 N g 6 DELAWARE. 1
37 A g 22 SERER . . 1216 1 C b 2 WOLOF . .  
38 A j 20 SURI. . . 106 1 C a 19 BANNA . .  
39 A j 28 TATOGA. . 458 1 C a 4 IRAQW . .  
40 C f 3 TRISTAN . 138 31.76476035 A a 4 BERGDAMA.  
41 C b 2 WOLOF . . 992 1 A g 22 SERER . .  
42 A h 33 YUNGUR. . 85 0 C b 27 BACHAMA .  
43 C b 26 ZAZZAGAWA 393 1 A h 6 GURE. . .  
44 C b 20 ZERMA . . 394 1.414213562 A g 45 GURMA . .  
juamiji1 commented 5 months ago

The FINAL table for outliers is:

v91 v92 v93 v107 v104 v106 v114 NEAR_v91 NEAR_v92 NEAR_v93 NEAR_v107 NEAR_v114 Outliers Same cluster Book Others category Book Coordinates correct Cordinates not correct according to v91 Lon Lat Already fixed in Nathan's DO
C a 27 ANFILLO . 9 35 92 A i 46 KOMA. . . 77     1 1          
A i 44 ANUAK . . 8 34 76 C a 27 ANFILLO . 92     1 0 1        
C c 8 AULLIMIND 8 2 105 A f 1 FON . . . 44     1 1          
C b 27 BACHAMA . 10 12 101 A h 35 BURA. . . 67     1 1          
E a 8 BAKHTIARI 33 48 142 C j 10 MADAN . . 139     1 0 1        
A j 18 BODI. . . 5 35 81 C a 20 GALAB . . 91     1 0 1        
C f 2 BOERS . . -16 28 121 A c 1 ILA . . . 7 1   1 1          
C b 8 BOROROFUL 13 5 103 A h 13 DAKAKARI. 62     1 0 1        
C f 4 BRAZILIAN -16 -47 117 S j 1 CARAJA. . 405 1   1 0 0   -43 -47 1
A h 35 BURA. . . 10 12 67 C b 27 BACHAMA . 101     1 1          
A h 13 DAKAKARI. 12 5 62 C b 8 BOROROFUL 103     1 1          
C b 21 DJAFUN. . 8 13 103 A i 16 NAMSHI. . 68     1 0 1        
A f 1 FON . . . 7 2 44 C c 8 AULLIMIND 105     1 0 1        
C f 5 FRENCHCAN 47 -72 119 N a 32 MONTAGNAI 280 1   1 0 1        
C a 9 GOROA . . -4 36 86 A d 5 MBUGWE. . 21     1 1          
A a 9 HATSA . . -3 35 20 C a 4 IRAQW . . 86     1 0 1        
C b 14 HEMAT . . 11 20 97 A i 22 SARA. . . 70     1 1          
C b 19 KANURI. . 12 13 100 A h 5 MARGI . . 67     1 0 1        
A i 46 KOMA. . . 9 35 77 C a 27 ANFILLO . 92     1 0 1        
A i 18 KOTOKO. . 12 15 69 C b 5 BUDUMA. . 99     1 1          
C b 22 LIPTAKO . 14 0 52 A g 53 BISA. . . 61     1 1          
A h 30 LONGUDA . 10 12 66 C b 27 BACHAMA . 101     1 0 1        
A i 47 MAO . . . 9 35 77 C a 27 ANFILLO . 92     1 0 1        
A d 5 MBUGWE. . -4 36 21 C a 9 GOROA . . 86     1 1          
A i 14 MBUM. . . 7 14 68 C b 21 DJAFUN. . 103     1 0 1        
C b 15 MESSIRIA. 11 28 97 A i 43 NYIMA . . 96     1 1          
N a 45 MISTASSIN 52 72 280 E b 1 KAZAK . . 148 1   0     1 52 -72 0
A h 31 MUMUYE. . 9 12 66 C b 27 BACHAMA . 101     1 0 1        
C f 1 NEWENGLAN 42 -73 120 N g 6 DELAWARE. 322 1   1 0 1        
A g 22 SERER . . 14 -17 51 C b 2 WOLOF . . 51   1 1 0 1        
A j 20 SURI. . . 6 35 81 C a 19 BANNA . . 91     1 0 1        
A j 28 TATOGA. . -5 35 85 C a 4 IRAQW . . 86     1 1          
C f 3 TRISTAN . -37 -12 120 A a 4 BERGDAMA. 3     1 0 1        
C b 2 WOLOF . . 15 -17 51 A g 22 SERER . . 51   1 1 0 1        
A h 33 YUNGUR. . 10 12 66 C b 27 BACHAMA . 101     1 0 1        
C b 26 ZAZZAGAWA 11 8 102 A h 6 GURE. . . 63     1 0 1        
C b 20 ZERMA . . 13 3 104 A g 45 GURMA . . 60     1 1          
juamiji1 commented 5 months ago

After: image Now it is closed to its ethnic cluster (280 according to EA book 1967).