Polish letters with diacritics are not handled properly

schildbach / public-transport-enabler

Unleash public transport data in your Java project.

https://groups.google.com/forum/#!forum/public-transport-enabler-discuss

GNU General Public License v3.0

388 stars 133 forks source link

Polish letters with diacritics are not handled properly #222

Closed Etua closed 2 years ago

Etua commented 6 years ago

While using Transportr I noticed that public transport stops with diacritics in their names are not displayed properly nor can be searched by their correct form. Instead they are replaced with closest standard Latin counterpart e.g. ą -> a, ć -> c etc. However ó is not affected for some reason.

@mimi89999

mimi89999 commented 6 years ago

Stop names and labels on Navitia contain only Latin characters, but data from @MKuranowski (https://mkuran.pl/feed/ztm/) doesn't have that issue. I will contact Navitia about the problem.

Stop point on Navitia:

{
    "embedded_type": "stop_point",
    "id": "stop_point:OWW:SP:sw2"
    "name": "Swietokrzyska - Peron M2 (Warszawa)",
    "quality": 0,
    "stop_point": ⊖{
        "administrative_regions": ⊕[2 items],
        "codes": ⊕[3 items],
        "coord": ⊕{2 items},
        "equipments": ⊕[1 item],
        "fare_zone": ⊕{1 item},
        "id": "stop_point:OWW:SP:sw2",
        "label": "Swietokrzyska - Peron M2 (Warszawa)",
        "links": [],
        "name": "Swietokrzyska - Peron M2",
        "stop_area": ⊕{7 items}
    },
}

prhod commented 6 years ago

Hello, this is indeed a limitation for some (most of ?) of navitia's coverages. When several datasets needs to be merge in one coverage, only iso-8859-1 characters are allowed (limitation of our actual merging tool). We are working on a new data process pipeline, but it will take some more months, sorry for the inconvenience. BTW, here is an example without merge on Israel data

Etua commented 2 years ago

I wasn't able to reproduce the issue with the newest release.