Open anitacaron opened 2 months ago
Hi @anitacaron it looks like there is many duplicates, if you skip the family column, the table would be much more smaller.
Yes, but I did that so it can be easily grouped by family or country of origin. Or don't you need the family information? Can I get feedback from Meriem and Mariem, please?
Hi @mariemh23 could you please check if it's ok for you to plot the map?
I think yes, it looks fine.
Hi Anita, I took a closer look at the table and I think that there is something wrong with the database.
First, there seems to be a separator problem in the table. Some values have been shifted. For example, some values in the region_name column contain population_name values (e.g. line 6628). Similarly, in the population_size column, there are also shifted values and some values are missing. There are certain values that contain the symbol ">" or "<" (< 1 million, >3 million) that disturbs the conversion of population sizes into numerical values. One last point, as raised earlier by Alia, we need a top group family, because with such a large number of values, it will be difficult to assign sufficiently different colors for each group to be easily visible on the map. Best,
@mariemh23, I can fix the region_name and the family columns. However, the population_size is what is available in the ontology, and some values are missing. It would be good to discuss this with @abenkahla so I can change the ontology or just have a post-processing step to remove the symbol >
in the population size annotation.
@mariemh23 I've updated the table in the Google spreadsheet. Could you please check?
Hi @anitacaron thanks for the upadtae, we just checked the table with @mariemh23, all the columns looks good excepet the family one as it's not standardized and not presented in a harmonized way. We should move forward with the map draft untill we fix the family column.
Hi anita, Thank you for your quick reply. I would also like to ask you about the language location column.is it possible to have the geo-location coordinates in a separate column (separated from the language name). Many thanks
it's not standardized and not presented in a harmonized way
@Melek-C let me know how I can change the family column
it possible to have the geo-location coordinates in a separate column (separated from the language name).
@mariemh23 yeah, I can do it in the spreadsheet, but this is how it's available in the ontology
HI @anitacaron, it's a bit tricky with the family column. I think we should keep just one term in the column and select only big families (there different subfamilies).
Maybe we could put the subfamilies in another annotation to make the ontology clearer?
Maybe we could put the subfamilies in another annotation to make the ontology clearer?
It could be interesting.
Do we have a final decision about the family annotation for the report? 😄
Hi @anitacaron we are trying to fix some ambiguities with the family annotation as it's not standardized. We will back to you soon.
Fixes #46
I'd like to make sure we have all the information needed. The report has more than 12 thousand rows, so I'll upload it in google drive and share the link to be downloaded.
Here's a sample of the table: (click on the image to see it larger)