Closed Gliese852 closed 3 years ago
This has been reported before, and is not a bug.
Consider:
Suggestions and improvements are always welcome to the random name generator though
The system generation of names is dictated by https://github.com/pioneerspacesim/pioneer/blob/master/src/galaxy/SectorGenerator.cpp#L17
Which has a collection of sub-strings it combines to generate system names.
Does anyone know how/why those fragments were chosen? Is it based on the phonetics/something of existing system naming?
I suspect it comes from original Elite, where they constructed sentences by combining such similar strings consisting of vowels and consonants.
One alternative way is to do it like a markov chain, where for each letter, (or letter pair) gives a vector of probablies of each next letter in the alphabet, and space/stop (sum to 1), which can be trained on some text corpus (e.g. english wikipedia in raw text). I did this a few years back but didn't think the results were that impressive, although I might have used a too small text sample for getting the letter frequencies.
(Maybe training on geographical locations would be better, but on the other hand, I doubt it)
But as I said, I really don't think this is a bug. Fala was the 134th Glorius Emperor of the Stallman faith, so only natural many systems have taken name from him.
Cambridge example, however, isn't that relevant as they are far from each other. Those four Fala, however, are in a single sector (!) which makes it especially egregious.
A solution I saw proposed for star/planet naming (completely elsewhere, totally unrelated to Pioneer, I think it was somebody wanting to create a star opera from scratch) involved tying it to position - I don't remember whether outright (some two-syllable combination for each possible integer coord) or just by having the position influence the seed from which name is generated.
The system generation of names is dictated by https://github.com/pioneerspacesim/pioneer/blob/master/src/galaxy/SectorGenerator.cpp#L17
Which has a collection of sub-strings it combines to generate system names.
Not directly related to the problem, but can't we have a million of these fragments instead of just 32?
Cambridge example, however, isn't that relevant as they are far from each other.
They're extremly close in university/education space. But if you want geographical space, how about Congo and Congo? They're right next to each other. Same with Macedonia and Macedonia.
Those four Fala, however, are in a single sector (!) which makes it especially egregious.
I can not see in the screenshot that they are next to each other. They could be 50ly on opposite side of the player.
or just by having the position influence the seed from which name is generated.
Not sure that would help, as you could still have repeated sequences. I'll insert my usual reference below:
Point being, no matter how good our algorithm is, as long as it's random, there will always be sectors out there somewhere with twin named systems.
The system generation of names is dictated by https://github.com/pioneerspacesim/pioneer/blob/master/src/galaxy/SectorGenerator.cpp#L17 Which has a collection of sub-strings it combines to generate system names.
Not directly related to the problem, but can't we have a million of these fragments instead of just 32?
Yes, it could have more.
First thing I'd do is make life easier on whoever updates it and change the code to:
static const char *sys_names[] = { "en", "la", "can", "be", "and", "phi", "eth", "ol", "ve", "ho", "a",
"lia", "an", "ar", "ur", "mi", "in", "ti", "qu", "so", "ed", "ess",
"ex", "io", "ce", "ze", "fa", "ay", "wa", "da", "ack", "gre" };
static const unsigned int SYS_NAME_FRAGS = ((unsigned int)(sizeof(sys_names) / sizeof(char *)));
So that it automatically determines the size of that array. Then pull in a load more Greek fragments...
...actually I've just starting doing this on my lunchbreak, draft-PR incoming shortly 😆
Point being, no matter how good our algorithm is, as long as it's random, there will always be sectors out there somewhere with twin named systems.
Twin named systems across the galaxy is not an issue. Twin, or more, named systems in SAME sector is - and that can be fixed fairly easily by keeping a list of already generated names. Upon name collision, you can then: a) regen the name b) append something to the name (another grab from the grab bag we have? append I, II, III or a,b,c, or Primo, Secundus, Tertius, or whatever else...)
I just checked: Fala: 14, -1, 1 Fala: 17, -5, 3 Fala: 20, -9, -1 Fala: 19, -9, -1 They are not in the same sector actually. True, they are relatively close, but my guess is they aren't any closer than about 20lys. (One sector is a 8x8x8lys cube). Not ideal for sure, but it's not that of a sore point in my opinion. A better namegen, with multiple flavours based on different cultures and such would be nice to have eventually though instead of these bland Frotnier-ish names.
Twin named systems across the galaxy is not an issue. Twin, or more, named systems in SAME sector is - and that can be fixed fairly easily by keeping a list of already generated names. Upon name collision, you can then: a) regen the name b) append something to the name (another grab from the grab bag we have? append I, II, III or a,b,c, or Primo, Secundus, Tertius, or whatever else...)
You can't check against other things in the sector, or surrounding sectors, for duplicates because:
Re: 1: how many systems are in a sector? 5? 20? Checking against a list isn't THAT expensive
let's say ~10 systems in a sector, currently we generate a blocks of sectors in a cube 100lyrs across for the initial known systems cache. That's ~13 sectors across, or 13x13x13 = 2197 sector * 10 systems = 21,970 systems needing naming.
We'll assume that we only care about duplicate names of systems within a single sector, because even if you're just checking direct neighbours it goes up by 17 extra sectors to check and violates rule 2 even more.
The cost of checking can be calculated like this: 1,2,3,4,5,6,7,8,9,10 <- system names generated 0,1,2,3,4,5,6,7,8,9 <- number of times it has to be checked Which is 45 string<>string comparisons per sector, or 2197 * 45 = 98,865 string comparisons which are not cheap.
Then we'd have to regenerate the names for collisions as well.
All of which is cheap if you only do it once, but we don't, we generate new sectors all the time as you scroll the view around in SectorView, and you can scroll around fast! Then there's the jobs board where each job can check a volume of space to generate it's posting which can also cause new sectors to be generated.
Finally, it wouldn't actually solve the problem described in this thread because they're not even in the same sector, just nearby sectors. So to solve that you'd need to check all 17 neighbouring sectors bringing the string comparisons upto 98,865 * 17 = 1,680,705
Along with the cost of regenerating collisions.
Plus all of that would really violate rule 2
- each sector and system should be capable of being generated independently
Thirdly, name generation then becomes slightly nondeterministic because it depends on the order in which systems were generated in a sector (in edge cases, at least) to determine which name a system will receive. That's something we'd like to avoid - it's fine to have behavior like this inside of a system - as long as it doesn't check against something outside of the same system it's still deterministic (because it comes from the same seed every time and thus will yield the same result) but if you maintain a name cache per-sector then naming is very dependent on ordering issues.
@fluffyfreak
bringing the string comparisons upto 98,865 * 17 = 1,680,705
Let us have 32 possible syllables, then all words up to 4 syllables are placed in 32 ^ 4 = 1,048,576 bytes, i.e. just 1 megabyte of memory is enough to determine collisions in O(1) ?
EDIT: no, since we only need to know the fact of availability, one bit per word is enough, so even 128 kb is enough for us
EDIT2: And in order to avoid non-determinism, blocks must be generated by strictly defined pieces. Of course, collisions between adjacent pieces should not be checked.
PS: I don’t really like the idea of checking for collisions and regenerating names. It seems we can just find a function that creates fewer collisions.
This gets really annoying for trade-runs. Do we want to go the Vala or Vala or Vala or Vala or Vala for the next run (all the systems are different). What if the sector index constraints the names? Then nothing has to get checked globally. This should solve the trade run issue, because the hashes of the sector positions should be fairly different for nearby positions. I don't mind if the same systemname exists 1000 lightyears apart, as it's not easy to reach anyways.
I added a PR because this trading issue pisses me off.
If anyone's wondering where the original name fragments came from http://www.jongware.com/galaxy1.html
They're a copy and paste from there
Observed behaviour
When searching for the Fala system, it turned out that there are 4 or even more.
Expected behaviour
Uniquie name for each star system.
Steps to reproduce
Slide to sector (19,-6,-2), search Fala
My pioneer version (and OS): Current master 8c1a495f6f046f8e8432720bd3b5d683b6d7b6c6