ropensci / bikedata

:bike: Extract data from public hire bicycle systems
https://docs.ropensci.org/bikedata
81 stars 16 forks source link

mystery quotes in guadaljara station names #68

Open mpadge opened 6 years ago

mpadge commented 6 years ago
store_bikedata (data_dir = "/guadalajara/data/dir", bikedb = bikedb)
bike_stations (bikedb)
# A tibble: 243 x 6
      id city  stn_id name                                   longitude latitude
   <int> <chr> <chr>  <chr>                                      <dbl>    <dbl>
 1     1 gu    gu2    (GDL-001) C. Epigmenio Glez./ Av. 16 …     -103.     20.7
 2     2 gu    gu3    "(GDL-002) C. Colonias  / Av.  Ni\xf1…     -103.     20.7
 3     3 gu    gu4    (GDL-003) C. Vidrio / Av. Chapultepec      -103.     20.7
 4     4 gu    gu5    (GDL-004) C. Ghilardi /C. Miraflores       -103.     20.7
 5     5 gu    gu6    (GDL-005) C. San Diego /Calzada Indep…     -103.     20.7
 6     6 gu    gu8    (GDL-006) C. Venustiano Carranza /C. …     -103.     20.7
 7     7 gu    gu9    (GDL-007)C. Epigmenio Glez./Av. Crist…     -103.     20.7
 8     8 gu    gu10   "(GDL-008) C. J. Angulo / C. Gonz\xe1…     -103.     20.7
 9     9 gu    gu11   (GDL-009) Calz. Federalismo/ C. J. An…     -103.     20.7
10    10 gu    gu12   (GDL-010) C. Cruz verde / C. Joaquin …     -103.     20.7
# ... with 233 more rows

Those quotes on the 2nd and 8th entries are there on the original data, even through they are removed in bike_get_gu_stations(). I also tried removing them in src/read-station-files, as well as inspecting the strings that are passed to the SQL storage. All are clean of these "quotes", yet they re-appear in the final stored version. This suggests they're not quotes at all, but some other, perhaps non-UTF-8, character. How to get rid of them?