Open stringandstickytape opened 9 years ago
Code to extract suffixes and prefixes from Maddavo's data:
List<string> suffixes = new List<string>();
List<string> prefixes = new List<string>();
StreamReader reader = File.OpenText(".//station.csv");
reader.ReadLine();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(',');
var suffix = values[1];
string prefix = "";
suffix = suffix.Substring(1, suffix.Length - 2);
if (!suffix.Contains(' '))
{
prefixes.Add(suffix);
}
else
{
prefixes.Add(suffix.Substring(0,suffix.LastIndexOf(' ')));
suffix = suffix.Substring(suffix.LastIndexOf(' ') + 1);
suffixes.Add(suffix);
if(suffix.Contains("Wagar"))
Debug.WriteLine("!");
}
}
reader.Close();
using(var file = new System.IO.StreamWriter(".//suffixes.txt"))
{
suffixes = suffixes.Distinct().OrderBy(x => x).ToList();
foreach (var x in suffixes)
file.WriteLine(x);
file.Close();
}
using (var file = new System.IO.StreamWriter(".//prefixes.txt"))
{
prefixes = prefixes.Distinct().OrderBy(x => x).ToList();
foreach (var x in prefixes)
file.WriteLine(x);
file.Close();
}
Hm. now that dumb bug is fixed, we should reassess, This may not be necessary at all, Tesseract is pretty good at getting the spaces right if no-one Replaces them back out again...
I've still had some stations missing the spaces, but now it's only about 5-10% of the time rather than 70%.
This might be harder than expected. For instance, Maddavo's stations file lists:
LUYTEN 674-15 => Nobleport
If that's correct, we have no way of knowing that this is a one-word station name, and doesn't have the suffix " Port". But maybe this is a best-effort algorithm that can't get it right every time. Or maybe the correct fix is to keep updating station.csv and hope the problem falls away over time.
This list of station name suffixes was extracted from Maddavo's stations file:
Anderton Andrade Apology Arena Asylum Ayres Base Beacon Camp Centre Chernobyl City Claim Coliseum Colony Co-Operative Cousens Depot Dive Dixon Doc Dock Enterprise Escape Estate Exchange Exile Eyrie Folly Fort Foundation Freeport Gambit Gate Gateway Goose Halt Ham Hanger Hangout Harrison Haven Hideout High Hold Holdings Holm Home Hope Horizons Hospital Hq Hub Inheritance Installation Jao Klarix Lab Laboratory Lambada Landing Lane Legacy Lincoln Lofthus Lucas Manoevre Manwaring Market Masters Matt Mausoleum Memorial Mine Mines Mojo Mortuary Nest Orbital Orbiter Outpost Owl Park Phoenix Plant Platform Point Port Post Pride Principality Progress Prospect Reach Refinery Reformatory Relay Research Reserve Rest Retreat Ring Sanctuary Scott Settlement Shipyard Silo Spaceport Station Stop Survey Terminal Thiemann Town Vision Vista Wart Way Works Yola Young