Closed ndushay closed 5 years ago
{'none': ['1900-1930', 's.d.]', 'د.ت.]']} ========> [1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930]
I think the Arabic text is causing a problem.
I could not replicate; i did have to rebuild docker image locally.
docker run --rm -e SKIP_FETCH_DATA=true -v $(pwd)/.:/opt/traject -v $(pwd)/../dlme-metadata:/opt/traject/data -v $(pwd)/output:/opt/traject/output suldlss/dlme-transform:latest harvard/islamic-heritage-project/data/990089081460203941.oai_dc.xml
"cho_date":{"none":["1900-1930","s.d.]","د.ت.]"]},
"cho_date_range_norm":[1900,1901,1902,1903,1904,1905,1906,1907,1908,1909,1910,1911,1912,1913,1914,1915,1916,1917,1918,1919,1920,1921,1922,1923,1924,1925,1926,1927,1928,1929,1930],
"cho_date_range_hijri":[1317,1318,1319,1320,1321,1322,1323,1324,1325,1326,1327,1328,1329,1330,1331,1332,1333,1334,1335,1336,1337,1338,1339,1340,1341,1342,1343,1344,1345,1346,1347,1348,1349]
I should probably go over my algorithm choices with the raw data in front of us; I've got it in an excel spreadsheet, and I color coded it, calling out all the spots that will likely give us sub-optimal results.