ropensci / rfishbase

R interface to the fishbase.org database
https://docs.ropensci.org/rfishbase
111 stars 40 forks source link

Unicode issues in common names #65

Closed cboettig closed 1 year ago

cboettig commented 9 years ago

From Noam's review (#49)

commonnames returns ? in place of Mandarin Chinese names. This appears to occur at the server level, though not on the fishbase website. Can the server not serve unicode?

cboettig commented 9 years ago

Cross-listing this under fishbaseapi, have to see if this is on the server side or the R client side.

sckott commented 9 years ago

@noamross can you give an example? want to make sure we're talking about the same thing

I'll look in the database itself in mysql, and the sinatra app first to see if there's any issue there.

noamross commented 9 years ago
common_names(c("Oreochromis niloticus")) %>%
  filter(Language %in% c("Russian", "Tamil", "Thai", "Korean", "Mandarin Chinese", "Vietnamese")) %>%
  print(n=26)
Source: local data frame [26 x 6]

            ComName         Language C_Code SpecCode       Genus   Species
              (chr)            (chr)  (chr)    (int)       (chr)     (chr)
1     Cá Rô phi v?n       Vietnamese    704        2 Oreochromis niloticus
2           Pla nil             Thai    764        2 Oreochromis niloticus
3            Rô phi       Vietnamese    704        2 Oreochromis niloticus
4           Tilapia            Tamil    356        2 Oreochromis niloticus
5  ??????? ????????          Russian   9999        2 Oreochromis niloticus
6          ????????            Tamil    356        2 Oreochromis niloticus
7            ??????             Thai    764        2 Oreochromis niloticus
8            ??????           Korean    410        2 Oreochromis niloticus
9            ?????? Mandarin Chinese    156        2 Oreochromis niloticus
10           ?????? Mandarin Chinese    156        2 Oreochromis niloticus
11       ??????(??) Mandarin Chinese    156        2 Oreochromis niloticus
12            ????? Mandarin Chinese    156        2 Oreochromis niloticus
13             ???? Mandarin Chinese    156        2 Oreochromis niloticus
14       ??????(??) Mandarin Chinese    156        2 Oreochromis niloticus
15            ????? Mandarin Chinese   156A        2 Oreochromis niloticus
16             ???? Mandarin Chinese    156        2 Oreochromis niloticus
17           ?????? Mandarin Chinese    156        2 Oreochromis niloticus
18           ?????? Mandarin Chinese    156        2 Oreochromis niloticus
19           ?????? Mandarin Chinese    156        2 Oreochromis niloticus
20           ?????? Mandarin Chinese    156        2 Oreochromis niloticus
21           ?????? Mandarin Chinese    156        2 Oreochromis niloticus
22           ?????? Mandarin Chinese    156        2 Oreochromis niloticus
23           ?????? Mandarin Chinese    156        2 Oreochromis niloticus
24           ?????? Mandarin Chinese    156        2 Oreochromis niloticus
25           ?????? Mandarin Chinese    156        2 Oreochromis niloticus
26           ?????? Mandarin Chinese    156        2 Oreochromis niloticus
sckott commented 9 years ago

thanks @noamross !

sckott commented 8 years ago

the SQL database does show the same ??? as you show above

sckott commented 8 years ago

@cboettig AFAICT I'm not sure we can fix this. Can you ask (or let me know who to ask) if Fishbase folks can check if somehow in making the dump they give us, the characters are borked?

sckott commented 8 years ago

@noamross sorry it's been so long on this, I'm pretty sure this is a problem in the SQL database we were given. I'll see if I can get a hold of Fishbase team about this

sckott commented 8 years ago

still not sorted out yet, removing from 2.2.0 mileston