Closed typophyllum closed 9 months ago
SELECT *
FROM tblDepos D --- get_tw_repo_id.keys.join ", " v
WHERE D.DepoID NOT IN (2257, 76, 1737, 1982, 2371, 3614, 3691, 1669, 1523, 101, 1195, 1540, 1711, 1722, 1814, 1946, 2237, 2331, 2396, 3489, 3585, 664, 999, 2091, 2966, 3569, 203, 1397, 1607, 1634, 2040, 2499, 26, 1043, 1939, 2321, 3218, 3422, 3568, 936, 18, 1356, 1628, 1766, 1933, 2313, 3514, 882, 1526, 1678, 3168, 1608, 934, 3543, 1548, 3220, 1564, 120, 1324, 2008, 2415, 257, 1424, 1713, 2043, 2554, 3354, 3669, 843, 1866, 2234, 2596, 49, 1763, 1919, 1959, 2344, 3426, 3626, 103, 1206, 1419, 1563, 1746, 1784, 1888, 1998, 2398, 3431, 3429, 27, 970, 1211, 1444, 1663, 1699, 1728, 1779, 1792, 1940, 3189, 3383, 3441, 3636, 35, 1358, 1512, 1639, 1688, 1729, 1911, 1945, 2330, 3501, 207, 1565, 1679, 1852, 2049, 2504, 3570, 51, 1896, 3457, 368, 2045, 2667, 3366, 561, 2863, 41, 87, 1268, 1575, 1754, 1799, 1951, 2336, 2401, 3525, 86, 940, 1297, 1464, 1516, 1632, 1690, 1767, 1804, 1857, 1991, 2381, 3185, 3226, 3433, 3546, 3607, 3670, 849, 3723, 44, 1546, 1954, 2339, 3481, 43, 399, 978, 1189, 1456, 1533, 1643, 1676, 1693, 1730, 1778, 1898, 1953, 2338, 2698, 3212, 3225, 3396, 3445, 3527, 3562, 3549, 1559, 3639, 1635, 3504, 3665, 465, 2060, 2765, 2256, 449, 1280, 1800, 2074, 2307, 2749, 82, 1519, 1641, 1987, 2377, 3673, 73, 1336, 1538, 1637, 1717, 1743, 1876, 1979, 2368, 3327, 3506, 8, 1365, 1657, 1701, 1724, 1926, 2303, 1557, 1905, 3432, 1601, 3476, 864, 2265, 62, 1658, 1702, 1733, 1969, 2357, 3688, 10, 1716, 1751, 1841, 1928, 2305, 3460, 67, 78, 954, 1167, 1534, 1640, 1689, 1735, 1880, 1973, 2362, 2373, 3438, 146, 2021, 2441, 3370, 3453, 3645, 939, 3654, 3638, 3642, 3556, 3687, 3603, 1884, 3617, 1826, 3722, 22, 1936, 2317, 3649, 1875, 1579, 854, 2294, 3597, 3621, 353, 2094, 2651, 3579, 2268, 591, 1554, 2893, 3593, 834, 2768, 3464, 3718, 870, 415, 2099, 2714, 823, 2247, 3547, 3667, 821, 2246, 1747, 3169, 3679, 1883, 2296, 3631, 204, 474, 2500, 2774, 871, 911, 3382, 891, 3484, 1602, 1532, 1868, 839, 385, 2098, 2684, 892, 3365, 1677, 147, 1344, 2442, 294, 2591, 3178, 3142, 3707, 1633, 3500, 2270, 1619, 3461, 3463, 3450, 45, 1353, 1760, 1955, 2340, 597, 2899, 124, 2419, 2261, 144, 2019, 2439, 928, 825, 2249, 931, 3416, 3686, 2264, 829, 2250, 867, 2267, 1818, 644, 1307, 1450, 1788, 1795, 1848, 2221, 2946, 3392, 818, 1856, 2243, 2793, 3191, 3663, 1583, 198, 2038, 2494, 318, 2615, 524, 802, 2826, 3105, 2274, 845, 861, 1647, 1667, 1827, 3503, 3531, 3609, 292, 2589, 1622, 634, 2936, 459, 2759, 661, 2963, 3602, 925, 3559, 3542, 456, 620, 984, 1589, 2756, 2922, 3541, 3632, 532, 2834, 3202, 903, 309, 1369, 2606, 479, 1547, 1821, 2062, 2779, 3634, 3482, 3701, 856, 2489, 3158, 3720, 3640, 320, 2225, 2618, 842, 3195, 1775, 425, 841, 2724, 90, 1239, 1860, 1993, 2385, 901, 2259, 3124, 3125, 357, 2655, 2285, 3534, 374, 2976, 2283, 3151, 883, 3367, 3703, 1624, 1834, 1914, 2260, 877, 1544, 1872, 348, 2646, 800, 3103, 1566, 1567, 3678, 676, 2978, 929, 3470, 1873, 331, 336, 2629, 2634, 1882, 274, 1448, 2571, 1714, 96, 2391, 3427, 915, 3120, 3134, 3647, 3141, 3572, 159, 924, 2454, 3194, 3711, 1715, 3487, 3704, 3693, 189, 2484, 1620, 3479, 3575, 1867, 3658, 133, 2015, 2428, 50, 1535, 1638, 1771, 1849, 1960, 2345, 3605, 3616, 3712, 3165, 2289, 163, 2029, 2458, 37, 974, 1330, 1517, 1682, 1947, 2332, 3222, 3608, 3473, 56, 1502, 1964, 2351, 3661, 3171, 69, 413, 1416, 1736, 1975, 2364, 2712, 3581, 428, 1901, 2059, 2727, 3436, 33, 1246, 1470, 1536, 1705, 1764, 1887, 1943, 2328, 3468, 4, 1521, 1924, 2299, 3615, 1654, 1696, 774, 3077, 1687, 3571, 3672, 3699, 150, 996, 1310, 2023, 2445, 3413, 3451, 3625, 3627, 57, 985, 1237, 1577, 1631, 1668, 1731, 1777, 1965, 3198, 3447, 3494, 3518, 3544, 3641, 423, 2722, 3475, 253, 2079, 1550, 63, 945, 1187, 1414, 1734, 1783, 1850, 1970, 2358, 3368, 3592, 610, 1855, 2090, 2912, 868, 1514, 1659, 2284, 3536, 48, 1958, 2343, 3434, 2287, 3, 947, 1218, 1399, 1616, 1923, 2298, 3341, 52, 1410, 1961, 445, 2744, 3127, 224, 1380, 1774, 1847, 2042, 2521, 3611, 912, 25, 1772, 1806, 1903, 1938, 2320, 3493, 521, 2233, 2823, 3145, 156, 2025, 2451, 3439, 1525, 1878, 206, 609, 1288, 1842, 2089, 2503, 2911, 3604, 827, 502, 2803, 814, 2240, 3117, 3683, 554, 2072, 2856, 3184, 853, 3548, 1748, 3164, 3182, 3648, 576, 2077, 2878, 1555, 878, 338, 3454, 874, 908, 38, 390, 995, 1655, 1660, 1948, 2054, 2333, 2689, 3480, 3507, 152, 1023, 1573, 1902, 2024, 2447, 2, 1233, 1524, 1791, 1844, 1922, 2297, 29, 1173, 1941, 114, 1256, 1824, 2006, 3490, 16, 1215, 1651, 1670, 1703, 1726, 1815, 1932, 2311, 3203, 3455, 3495, 3582, 2288, 3467, 127, 2010, 2422, 20, 1048, 1259, 1727, 1822, 1832, 1934, 2315, 862, 847, 3695, 3471, 1706, 1785, 2324, 3583, 175, 1829, 2470, 1630, 3517, 960, 1177, 1474, 1541, 1626, 1691, 1723, 1781, 1787, 1891, 1918, 1925, 2301, 2407, 3402, 3519, 344, 2051, 2642, 817, 1357, 2242, 3551, 74, 1017, 1250, 1484, 1545, 1758, 1786, 1912, 1915, 1980, 2369, 3206, 575, 2076, 2877, 3537, 178, 219, 273, 1283, 1549, 1652, 1698, 1789, 2041, 2516, 2570, 3440, 53, 2348, 40, 926, 1406, 1551, 1708, 1864, 1950, 2335, 3387, 75, 643, 1281, 1710, 1981, 2370, 2945, 918, 1837, 129, 2011, 3359, 17, 1596, 2312, 549, 2851, 72, 1865, 1978, 2367, 1513, 21, 1348, 1845, 1935, 2316, 922, 1773, 1798, 1900, 2263, 3521, 46, 990, 1255, 1956, 2341, 1803, 200, 478, 2039, 2061, 2496, 2778, 3477, 3573, 1720, 3143, 24, 1816, 1937, 2319, 1642, 60, 1275, 1712, 1732, 1830, 1861, 1886, 1968, 2355, 3375, 3588, 3708, 110, 2004, 2405, 491, 930, 1349, 1825, 2064, 2792, 440, 2255, 2739, 1817, 3154, 3452, 347, 1034, 1197, 1429, 1581, 1653, 1674, 1680, 1695, 1741, 1780, 2082, 2534, 2645, 3417, 3437, 3553, 3606, 857, 2046, 2668, 3170, 3335, 3620, 921, 1675, 54, 1745, 1871, 1899, 1962, 2349, 541, 2068, 2843, 614, 987, 1613, 2092, 2916, 1656, 1700, 1836, 3135, 289, 2105, 2586, 824, 2248, 3652, 523, 2096, 2825, 574, 2075, 2876, 855, 3529, 15, 1384, 1570, 1756, 1931, 2310, 520, 2088, 2822, 3449, 2290, 1681, 1685, 830, 1666, 2251, 11, 1725, 1910, 1929, 2306, 3456, 1671, 2254, 2277, 388, 2053, 2687, 672, 2235, 2974, 1615, 1673, 1719, 2346, 412, 451, 1438, 1797, 1869, 1895, 1916, 2002, 2403, 2711, 3338, 1598, 281, 2097, 2578, 3635, 94, 1531, 1661, 1697, 1796, 1831, 2389, 879, 895, 3596, 1707, 513, 1749, 2080, 2815, 3118, 66, 1308, 1755, 1808, 1972, 2361, 3584, 1894, 1644, 567, 2103, 2869, 3633, 182, 2477, 1603, 3219, 461, 2106, 2761, 441, 2058, 2740, 109, 1802, 2003, 2404, 3147, 34, 1944, 2329, 260, 2081, 2557, 70, 1462, 1585, 1976, 2365, 3598, 3622, 3492, 130, 2012, 2425, 603, 1493, 1537, 1709, 2078, 2905, 1606, 888, 59, 1012, 1543, 1967, 2354, 3221, 3469, 3644, 3599, 89, 1302, 1992, 2384, 2258, 3123, 1518, 3600, 3653, 1605, 3448, 3524, 850, 3140, 3520, 3637, 483, 1752, 2063, 2784, 844, 1611, 161, 2028, 2456, 71, 1977, 2366, 3131, 3650, 920, 3474, 39, 1828, 1949, 2334, 3550, 1556, 508, 2070, 2809, 657, 2083, 2959, 3138, 890, 1874, 3130, 3657, 2347, 351, 1455, 2086, 2649, 1721, 141, 2017, 2436, 3680, 30, 195, 1351, 1897, 1942, 2325, 2491, 3539, 3662, 47, 1267, 1370, 1957, 2342, 2382, 3391, 23, 2318, 3172, 3173, 1625, 1907, 2036, 2492, 3428, 157, 2026, 2452, 1568, 145, 2020, 2440, 534, 2066, 2836, 279, 2093, 2576, 283, 1853, 2100, 2580, 3393, 831, 2252, 1759, 3696, 65, 1971, 2360, 621, 2107, 2923, 677, 2104, 2979, 3532, 840, 148, 1833, 2022, 2443, 3516, 14, 1843, 1889, 1930, 2309, 2435, 916, 397, 2084, 2696, 3586, 2271, 183, 1451, 1586, 1672, 1858, 2032, 2478, 205, 2044, 2502, 265, 2085, 2562, 560, 1469, 2087, 2862, 122, 2009, 2417, 3629, 373, 2050, 2672, 933, 1650, 1694, 420, 2055, 2719, 3458, 3159, 1662, 1692, 1740, 1809, 3528, 2473, 139, 2434, 84, 1263, 1490, 1906, 1989, 2379, 401, 439, 1597, 2057, 2738, 55, 1963, 2350, 3674, 2278, 3137, 104, 1999, 2399, 1819, 3523, 460, 1913, 2760, 95, 1810, 1996, 2390, 3580, 241, 2073, 2538, 100, 1997, 2395, 158, 875, 2027, 2453, 135, 2016, 2430, 132, 2014, 2427, 370, 1291, 1835, 2048, 2669, 3589, 197, 2037, 2493, 1768, 801, 1373, 1515, 1522, 1704, 1776, 2238, 3104, 3655, 3533, 539, 2841, 1665, 589, 2095, 2891, 111, 1527, 1846, 2005, 2406, 3656, 192, 2487, 3574, 1890, 1859, 938, 927, 190, 2035, 2485, 2286, 77, 826, 1753, 1983, 2372, 832, 2253, 376, 1851, 2052, 2675, 81, 98, 1320, 1986, 2376, 2393, 160, 215, 816, 2241, 2455, 2512, 42, 1363, 1908, 1952, 2337, 313, 2223, 2610, 1684, 2071, 2810, 278, 2220, 2575, 500, 1892, 1995, 2065, 2801, 3526, 3466, 3538, 3595, 181, 2476, 538, 1552, 1646, 2510, 2840, 3498, 3646, 3702, 1757, 1794, 1553, 3664, 92, 1335, 1812, 1994, 2387, 131, 2013, 2426, 893, 2275, 58, 1838, 1966, 2353, 3681, 79, 1984, 2374, 3563, 3594, 3676, 9, 1742, 1807, 1854, 1927, 2304, 173, 2031, 2468, 2262, 455, 2755, 80, 1823, 1985, 2375, 1881, 107, 1345, 1588, 1769, 1801, 2001, 2402, 3483, 1528, 3623, 869, 186, 1765, 2034, 2481, 314, 2101, 2611, 2047, 2636, 3362, 2272, 2293, 105, 1542, 2000, 2400, 115, 1627, 1718, 1739, 1805, 2007, 2410, 3515, 443, 2069, 2742, 2281, 3706, 3628, 85, 155, 170, 172, 1314, 1496, 1539, 1738, 1793, 1820, 1879, 1885, 1909, 1990, 2380, 2450, 2465, 2467, 3348, 3478, 3700, 83, 1441, 1870, 1988, 2378, 427, 2056, 2726, 1599, 188, 2483, 935, 2273, 860, 142, 2018, 2437, 185, 2033, 2480, 409, 2708, 904, 3443, 68, 1974, 2363, 3446, 3567, 1645, 3499) AND
EXISTS (SELECT * FROM tblSpecimens WHERE DepoID = D.DepoID AND FileID = 1)
Import logs don't seem to record when a repo was not found so this would require a second pass over already imported data to add data attribute having the original SF DepoID (and perhaps other fields) as an initial step. We need to first check if any of the missing Depos already exist as reposotories in TW, we must avoid just uploading them all.
These repos are probably all missing in TW (I added the first one for new types in Massa 2023). So far I counted more than 1200 specimens without repo, almost 200 of them types, and there are probably many more.
@typophyllum You can do a lot of queries in filter CO to get numbers. This is "without repository": /tasks/collection_objects/filter?per=50&repository=false&extend[]=dwc_occurrence&extend[]=repository&extend[]=current_repository&extend[]=collecting_event&extend[]=taxon_determinations&extend[]=identifiers&exclude[]=object_labels&page=1.
So yes, there are many records without Repository linked, maybe 1/3 of the close to 100k.
Note, don't confuse "current repository" (a repository, typically not provided, indicating the current physical location of the specimen, with "repository" (the "home" repository, where it will ultimately reside).
Another note- "Current repository = with` may be broken.
These repos are probably all missing in TW (I added the first one for new types in Massa 2023). So far I counted more than 1200 specimens without repo, almost 200 of them types, and there are probably many more.
Technically are all missing, but maybe a "synonym" exist for some of them.
The original OSF has numerous specimen records without repository (unknown or georeferenced field observations linked to photos of live individuals). Would it be possible to match the repos in the old OSF (or all species files) against the repos used in TW, to see which ones are missing? Otherwise we would need to filter for collection objects that are types and don't have a repo (such already exist in the old OSF, but not that many). The lost repos with only normal specimens (e.g. INAT) will be more difficult to find.
Notes on how repositories were mapped by Marilyn (and others?):
/*
Notes about SF depos and mappings onto TW repos:
This task was done manually on SFdbMB database (finished 22 June 2017). Max(DepoID) was 3693. Max(DepoID) on SFdb on the same date
was 3725. Updates could be manually mapped using the max(DepoID) as a basis, though may not be a useful exercise. Any unmapped depo
should have a DepoString associated with it which can be used as a data_attribute or note associated with the collection object, etc.
This program to generate sfDepoStrings should be rerun when a new import with fresh data is processed.
The mapping process started with a spreadsheet containing the TW repo_id, acronyms and name columns (Excel files TWrepos.xlsx,
TWrepos.xls, and TWreposBACKUP.xlsx). A new column was added as SFDepoIDarray which contained comma separated SFDepoIDs corresponding
to the TW repo. The two columns, TWDepoID and SFDepoIDarray (longest record length = 120 characters), were eventually (through a very
ugly process) put in tabular form, sfTWDepoMappings. This is the table that is made available to TW as tab delimited text file. The
depo data should be uploaded then generated into a hash consisting of the SFDepoID as key and the TWRepoID as value.
Query to generate sfTWDepoMappings: SELECT * FROM [SFdbMB].[dbo].[sfTWDepoMappings] where TWDepoID > 0 order by TWDepoID
Note: The following depos are now found in grbio.org website that were not in TW (and the corresponding SFDepoID):
ABCL (2257), HN (3174), UBDH (3715)
*/
So all depos added or modified after June 2017 couldn't migrate appropriately?
Added, I don't think modifications causes depos to be lost given they preserve their DepoID.
Could you perhaps extract from the old species files a list of DepoIDs, depository name, and species file number? So we could verify especially the ones with DepoID > 3693.
What information is missing in the spreadsheet linked here?
In this case... OSF seems to have a duplicated repository.
Just informing here to keep track In TW it is present as
The 225 specimens in the Shanghai Entomology Museum, Chinese Academy of Sciences (SEM) seem to have been distributed into several different repos. Or what happened here?
I think them were mapped to IEAS
https://sfg.taxonworks.org/repositories/4764
I guess it is incorrect?
Yes, Institute of Entomology IEAS has Project use: 229. At least it isn't totally incorrect. It's probably identical to SIES. A new holotype I just assigned to SHIM. Very confusing.
Now the List of missing repositories has links to TW repos, except for a few obsolete and doubtful ones with very few specimens (the green ones I already fixed manually).
Apparently repositories so far unknown to TW did not migrate, and the corresponding types and other collection objects are now with "Repository: not specified". Some examples:
African Natural History Research Trust, Hereford, UK (ANHRT) - 326 in total / 26 types (repo added for new specimens) Akdeniz University, Department of Biology, Zoological Museum (AUZM) - 50 in total/42 types Emirates Center for Wildlife Propagation, Morocco (ECWP) - 25/2 Institut National Agronomique de Tunisie (INAT) - 20/0 Universiti Brunei Darussalam Museum (UBDM) - 18/15 Collection B. Massa, University of Palermo (BMPC) - 206/42 C. Hemp Collection - 35 types Heller Collection - 528/36 Museu Anchieta, Porto Alegre (MAPA) - 1 holotype Collection of Sheyla Yong, La Habana, Cuba - 28/20 Brazil, Paraná, Curitiba, Universidade Federal do Paraná, Museu de Entomologia Padre Jesus Santiago Moure (DZUP) - 81/22 Insect collection of Agriculture Science Center, Ghyalchok, Gorkha, Nepal - 13/5 (repo added for new specimens) Annapurna Natural History Museum, Pokhara, Nepal (ANHM) - 6/6 (repo added for new specimens)
I guess we'd need to filter first the types for Repository not specified and then other collection objects to find all cases and think how to solve this.