marook / osm-read

an openstreetmap XML and PBF data parser for node.js and the browser
GNU Lesser General Public License v3.0
107 stars 25 forks source link

number instead of string for id, uid and ref? #21

Closed nrenner closed 10 years ago

nrenner commented 10 years ago

Is there a reason why id, uid and way node/relation member id refs are returned as string instead of number?

The PBF Format defines them all as numbers:

int64 id
int32 uid
sint64 refs
sint64 memids

I think memory usage could be reduced by using numbers instead of strings.

marook commented 10 years ago

I'm not 100% sure anymore why I thought using Strings for IDs would be the best choice. I remember dark that I had some problems with huge IDs but I'm not sure whether it was in JavaScript, Java or Python.

I think that problems could arise from the way JavaScript handles big integer values. See http://stackoverflow.com/a/9643650/404522

From an API point of view I used strings because they can handle 64 bit integers without a problem. The problem of handling 64 bit integers only arises on the osm-read implementation side. I thought that using strings in the interface leaves me the design choice to switch to some different way of parsing 64 bit IDs if I detect that the precision of the JavaScript integers is not good enough.

I think I'm going to add a test with a 0xffffffffffffffff ID and see what happens. If such high values can be represented in JavaScript we can change the datatype of ID, UIDs, etc. to integer in order to save some memory. If the values can not be represented as JavaScript integers with enough precision I'm not yet sure what to do... Strings are "native" JavaScript datatypes. This makes comparing them and using them as keys in objects easy. Thats why I'm currently not very convinced of something like { lower32bits: 123, higher32bits: 456 } as representation for 64 bit integers.

What do you think?

tyrasd commented 10 years ago

I think I'm going to add a test with a 0xffffffffffffffff ID and see what happens. If such high values can be represented in JavaScript we can change the datatype of ID, UIDs, etc. to integer in order to save some memory.

Actually, in JavaScript every "number" is stored as a double precision float which can hold Integers up to about 2^52 without rounding errors. 2^64 wouldn't work, though.

nrenner commented 10 years ago

Thanks for your responses, I feared there would be an issue with the number type. So I guess we just leave it as is.

An idea might be to have it configurable, so the user can decide to take the risk of not supporting 64-bit ids properly. But not sure what the actual benefit would be and not the highest priority right now, so closing.