Closed elirnm closed 7 years ago
It's possible the Crubadan data has been updated since the CSV file was created. I copied the CSV file from the previous language identifier. I'm not sure how the CSV file was originally obtained or created, but if I download the WritingSystems.csv file from http://crubadan.org/writingsystems, I only see ~2000 rows. I do note that some rows have a "child_ws" field filled in, so for example abt
has abt-x-maprik abt-x-wosera
as the child_ws value, so perhaps that's where the other ~124 entries come from?
The download information button there says the CSV download has maximum 2000 rows, so it probably just cuts off at that point.
The entries that don't appear in the CSV are any that have a bcp-47 code which comes alphabetically after wherever the CSV cuts off, which in the case of the one in res/
is wrs
. The missing entries include things like zho
for Chinese and xh
for Xhosa, so it's not just variants or obscure or not-fully-recognized items.
The download information button there says the CSV download has maximum 2000 rows, so it probably just cuts off at that point.
Oh, good catch. I hope there's a solution besides manually entering the remaining items.
I was able to get the missing entries by sorting the table in reverse order and then downloading the csv again. I'll incorporate the two csvs and then commit the combined one.
The
Crubadan.csv
file ends at bcp-47 codewrs
(sorted alphabetically), but Crubadan provides data for (and we have in the language table) a number of languages with codes past that. The csv file has 2,000 language entries in it, but the downloads table on the website lists 2,124 entries.