ivoa-std / VOTable

VOTable Format Definition
4 stars 15 forks source link

Add proper string arrays #31

Closed msdemlei closed 1 year ago

msdemlei commented 1 year ago

VOTable models strings as arrays of characters. While that is marginally good enough for simple strings, it is a major headache as soon as you have arrays of strings, because to serialise those, you need to globally find the longest string in all of the arrays contributing to a column (and then pad all others).

Various standards work around this problem in various ways (probably the most popular: separating multiple strings using a hash character) that are unsatisfactory in various ways.

It would hence be great if VOTable natively supported arrays of (variable length) strings.

There are various ways to tackle that.

The most general way would be to think of some general way to have "Arrays of variable-length arrays". That would be nifty for the multipolygons Pat wants to have, too. But that's a really hard problem given the way we serialise arrays right now. Frankly, I think if we'd want to go this way, we'd have to think of a secret handshake, with arraysize giving a total length and then some way to split up the whole thing into the chunks on the various dimensions (I think it would a acceptable if legacy clients only looking at arraysize would see some sort of junk, but they should at least be able to safely ignore the variable-length arrays ).

For me (who doesn't like multipolygons to begin with), introducing a native VOTable string type sounds attractive. This could use some delimiter and escaping convention or so. Of course, the challenge here is to make sure that legacy parsers don't choke on whatever we do.

Or we go crazy and introduce a json type in VOTable, where a cell would contain a json literal. This would open the door to all kinds of uglyness (cf. the postgres experience; they've gone this way several years ago), but it would solve probably all of the variable-length X problems.

I frankly like none of these options; I suppose there's been no progress on this for many years despite a clear need because it is hard.

But who knows: Perhaps there is a silver bullet after all?

tomdonaldson commented 1 year ago

Closing in favor of #25