4 Fala system in Sector (19, -6, -2)

Gliese852 commented 4 years ago

Screenshot from 2020-04-26 15-07-35

Observed behaviour

When searching for the Fala system, it turned out that there are 4 or even more.

Expected behaviour

Uniquie name for each star system.

Steps to reproduce

Slide to sector (19,-6,-2), search Fala

My pioneer version (and OS): Current master 8c1a495f6f046f8e8432720bd3b5d683b6d7b6c6

impaktor commented 4 years ago

This has been reported before, and is not a bug.

Consider:

there are two major university cities called Cambridge (UK / US)

fluffyfreak commented 4 years ago

Suggestions and improvements are always welcome to the random name generator though

fluffyfreak commented 4 years ago

The system generation of names is dictated by https://github.com/pioneerspacesim/pioneer/blob/master/src/galaxy/SectorGenerator.cpp#L17

Which has a collection of sub-strings it combines to generate system names.

fluffyfreak commented 4 years ago

Does anyone know how/why those fragments were chosen? Is it based on the phonetics/something of existing system naming?

impaktor commented 4 years ago

I suspect it comes from original Elite, where they constructed sentences by combining such similar strings consisting of vowels and consonants.

One alternative way is to do it like a markov chain, where for each letter, (or letter pair) gives a vector of probablies of each next letter in the alphabet, and space/stop (sum to 1), which can be trained on some text corpus (e.g. english wikipedia in raw text). I did this a few years back but didn't think the results were that impressive, although I might have used a too small text sample for getting the letter frequencies.

(Maybe training on geographical locations would be better, but on the other hand, I doubt it)

impaktor commented 4 years ago

But as I said, I really don't think this is a bug. Fala was the 134th Glorius Emperor of the Stallman faith, so only natural many systems have taken name from him.

Zireael07 commented 4 years ago

Cambridge example, however, isn't that relevant as they are far from each other. Those four Fala, however, are in a single sector (!) which makes it especially egregious.

A solution I saw proposed for star/planet naming (completely elsewhere, totally unrelated to Pioneer, I think it was somebody wanting to create a star opera from scratch) involved tying it to position - I don't remember whether outright (some two-syllable combination for each possible integer coord) or just by having the position influence the seed from which name is generated.

WKFO commented 4 years ago

The system generation of names is dictated by https://github.com/pioneerspacesim/pioneer/blob/master/src/galaxy/SectorGenerator.cpp#L17

Which has a collection of sub-strings it combines to generate system names.

Not directly related to the problem, but can't we have a million of these fragments instead of just 32?

impaktor commented 4 years ago

Cambridge example, however, isn't that relevant as they are far from each other.

They're extremly close in university/education space. But if you want geographical space, how about Congo and Congo? They're right next to each other. Same with Macedonia and Macedonia.

Those four Fala, however, are in a single sector (!) which makes it especially egregious.

I can not see in the screenshot that they are next to each other. They could be 50ly on opposite side of the player.

or just by having the position influence the seed from which name is generated.

Not sure that would help, as you could still have repeated sequences. I'll insert my usual reference below:

dilbert

Point being, no matter how good our algorithm is, as long as it's random, there will always be sectors out there somewhere with twin named systems.

fluffyfreak commented 4 years ago

The system generation of names is dictated by https://github.com/pioneerspacesim/pioneer/blob/master/src/galaxy/SectorGenerator.cpp#L17 Which has a collection of sub-strings it combines to generate system names.

Not directly related to the problem, but can't we have a million of these fragments instead of just 32?

Yes, it could have more.

First thing I'd do is make life easier on whoever updates it and change the code to:

static const char *sys_names[] = { "en", "la", "can", "be", "and", "phi", "eth", "ol", "ve", "ho", "a",
    "lia", "an", "ar", "ur", "mi", "in", "ti", "qu", "so", "ed", "ess",
    "ex", "io", "ce", "ze", "fa", "ay", "wa", "da", "ack", "gre" };
static const unsigned int SYS_NAME_FRAGS = ((unsigned int)(sizeof(sys_names) / sizeof(char *)));

So that it automatically determines the size of that array. Then pull in a load more Greek fragments...

...actually I've just starting doing this on my lunchbreak, draft-PR incoming shortly 😆

Zireael07 commented 4 years ago

Point being, no matter how good our algorithm is, as long as it's random, there will always be sectors out there somewhere with twin named systems.

Twin named systems across the galaxy is not an issue. Twin, or more, named systems in SAME sector is - and that can be fixed fairly easily by keeping a list of already generated names. Upon name collision, you can then: a) regen the name b) append something to the name (another grab from the grab bag we have? append I, II, III or a,b,c, or Primo, Secundus, Tertius, or whatever else...)

bszlrd commented 4 years ago

I just checked: Fala: 14, -1, 1 Fala: 17, -5, 3 Fala: 20, -9, -1 Fala: 19, -9, -1 They are not in the same sector actually. True, they are relatively close, but my guess is they aren't any closer than about 20lys. (One sector is a 8x8x8lys cube). Not ideal for sure, but it's not that of a sore point in my opinion. A better namegen, with multiple flavours based on different cultures and such would be nice to have eventually though instead of these bland Frotnier-ish names.

fluffyfreak commented 4 years ago

Twin named systems across the galaxy is not an issue. Twin, or more, named systems in SAME sector is - and that can be fixed fairly easily by keeping a list of already generated names. Upon name collision, you can then: a) regen the name b) append something to the name (another grab from the grab bag we have? append I, II, III or a,b,c, or Primo, Secundus, Tertius, or whatever else...)

You can't check against other things in the sector, or surrounding sectors, for duplicates because:

it's computationally expensive,
each sector and system should be capable of being generated independently

Zireael07 commented 4 years ago

Re: 1: how many systems are in a sector? 5? 20? Checking against a list isn't THAT expensive

fluffyfreak commented 4 years ago

let's say ~10 systems in a sector, currently we generate a blocks of sectors in a cube 100lyrs across for the initial known systems cache. That's ~13 sectors across, or 13x13x13 = 2197 sector * 10 systems = 21,970 systems needing naming.

We'll assume that we only care about duplicate names of systems within a single sector, because even if you're just checking direct neighbours it goes up by 17 extra sectors to check and violates rule 2 even more.

The cost of checking can be calculated like this: 1,2,3,4,5,6,7,8,9,10 <- system names generated 0,1,2,3,4,5,6,7,8,9 <- number of times it has to be checked Which is 45 string<>string comparisons per sector, or 2197 * 45 = 98,865 string comparisons which are not cheap.

Then we'd have to regenerate the names for collisions as well.

All of which is cheap if you only do it once, but we don't, we generate new sectors all the time as you scroll the view around in SectorView, and you can scroll around fast! Then there's the jobs board where each job can check a volume of space to generate it's posting which can also cause new sectors to be generated.

Finally, it wouldn't actually solve the problem described in this thread because they're not even in the same sector, just nearby sectors. So to solve that you'd need to check all 17 neighbouring sectors bringing the string comparisons upto 98,865 * 17 = 1,680,705

Along with the cost of regenerating collisions.

Plus all of that would really violate rule 2

each sector and system should be capable of being generated independently

Web-eWorks commented 4 years ago

Thirdly, name generation then becomes slightly nondeterministic because it depends on the order in which systems were generated in a sector (in edge cases, at least) to determine which name a system will receive. That's something we'd like to avoid - it's fine to have behavior like this inside of a system - as long as it doesn't check against something outside of the same system it's still deterministic (because it comes from the same seed every time and thus will yield the same result) but if you maintain a name cache per-sector then naming is very dependent on ordering issues.

Gliese852 commented 4 years ago

@fluffyfreak

bringing the string comparisons upto 98,865 * 17 = 1,680,705

Let us have 32 possible syllables, then all words up to 4 syllables are placed in 32 ^ 4 = 1,048,576 bytes, i.e. just 1 megabyte of memory is enough to determine collisions in O(1) ?

EDIT: no, since we only need to know the fact of availability, one bit per word is enough, so even 128 kb is enough for us

EDIT2: And in order to avoid non-determinism, blocks must be generated by strictly defined pieces. Of course, collisions between adjacent pieces should not be checked.

PS: I don’t really like the idea of checking for collisions and regenerating names. It seems we can just find a function that creates fewer collisions.

PtrMan commented 4 years ago

This gets really annoying for trade-runs. Do we want to go the Vala or Vala or Vala or Vala or Vala for the next run (all the systems are different). What if the sector index constraints the names? Then nothing has to get checked globally. This should solve the trade run issue, because the hashes of the sector positions should be fairly different for nearby positions. I don't mind if the same systemname exists 1000 lightyears apart, as it's not easy to reach anyways.

PtrMan commented 4 years ago

I added a PR because this trading issue pisses me off.

fluffyfreak commented 4 years ago

If anyone's wondering where the original name fragments came from http://www.jongware.com/galaxy1.html

They're a copy and paste from there

pioneerspacesim / pioneer