keplergl / kepler.gl

Kepler.gl is a powerful open source geospatial analysis tool for large-scale data sets.
http://kepler.gl
MIT License
10.12k stars 1.71k forks source link

[Bug] GeoArrow reading latitude as zero #2509

Open cdeverell opened 5 months ago

cdeverell commented 5 months ago

When creating a .arrow file and loading into the kepler.gl demo app, it appears to read the latitude coordinates as zero.

Tested with a single point in .arrow format. The point shows on the map, however the latitude isn't being read correctly, and defaults to zero.

I used geoarrow-python / pyarrow to create a simple .arrow file as follows:

import geoarrow.pyarrow as ga
import pyarrow as pa

# construct a geoarrow array
geo_array = ga.as_geoarrow(["POINT (174.753277  -36.85153)"])

# additional id column
id_array = pa.array([1], type=pa.int64())

# construct table
table = pa.Table.from_arrays([id_array, geo_array], ['id', 'geometry'])

# Save the PyArrow Table to an Arrow file
with pa.OSFile('output.arrow', 'wb') as sink:
    with pa.RecordBatchFileWriter(sink, table.schema) as writer:
        writer.write_table(table)

I noticed that the geometry column is a dictionary, with 'x' and 'y' as keys. Which seems to differ from the nyc_earnings example file, that is a list with the lat lon values: Test file: image

NYC demo file: image

lixun910 commented 5 months ago

Sorry that we don’t have any documentation about this: Kepler.gl only supports arrow without compression. The default write_table() function has compression='snappy' so I doubt this might cause the problem to load the arrow file in Kepler. Can you attach the arrow file? Thanks! On Jan 14, 2024, at 2:29 PM, cdeverell @.***> wrote: When creating a .arrow file and loading into the kepler.gl demo app, it appears to read the latitude coordinates as zero. Tested with a single point in .arrow format. The point shows on the map, however the latitude isn't being read correctly, and defaults to zero. I used geoarrow-python / pyarrow to create a simple .arrow file as follows: import geoarrow.pyarrow as ga import pyarrow as pa

construct a geoarrow array

geo_array = ga.as_geoarrow(["POINT (174.753277 -36.85153)"])

additional id column

id_array = pa.array([1], type=pa.int64())

construct table

table = pa.Table.from_arrays([id_array, geo_array], ['id', 'geometry'])

Save the PyArrow Table to an Arrow file

with pa.OSFile('output.arrow', 'wb') as sink: with pa.RecordBatchFileWriter(sink, table.schema) as writer: writer.write_table(table) I noticed that the geometry column is a dictionary, with 'x' and 'y' as keys. Which seems to differ from the nyc_earnings example file, that is a list with the lat lon values: Test file: image.png (view on web) NYC demo file: image.png (view on web)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

cdeverell commented 5 months ago

Thanks for looking into this. Attached is the small sample file.

On Mon, 15 Jan 2024 at 14:33, Xun Li @.***> wrote:

Sorry that we don’t have any documentation about this: Kepler.gl only supports arrow without compression. The default write_table() function has compression='snappy' so I doubt this might cause the problem to load the arrow file in Kepler. Can you attach the arrow file? Thanks! On Jan 14, 2024, at 2:29 PM, cdeverell @.***> wrote: When creating a .arrow file and loading into the kepler.gl demo app, it appears to read the latitude coordinates as zero. Tested with a single point in .arrow format. The point shows on the map, however the latitude isn't being read correctly, and defaults to zero. I used geoarrow-python / pyarrow to create a simple .arrow file as follows: import geoarrow.pyarrow as ga import pyarrow as pa

construct a geoarrow array

geo_array = ga.as_geoarrow(["POINT (174.753277 -36.85153)"])

additional id column

id_array = pa.array([1], type=pa.int64())

construct table

table = pa.Table.from_arrays([id_array, geo_array], ['id', 'geometry'])

Save the PyArrow Table to an Arrow file

with pa.OSFile('output.arrow', 'wb') as sink: with pa.RecordBatchFileWriter(sink, table.schema) as writer: writer.write_table(table) I noticed that the geometry column is a dictionary, with 'x' and 'y' as keys. Which seems to differ from the nyc_earnings example file, that is a list with the lat lon values: Test file: image.png (view on web) NYC demo file: image.png (view on web)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/keplergl/kepler.gl/issues/2509#issuecomment-1891167400, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEK7LK57TAEBZUZEJHMUQDYOSBPPAVCNFSM6AAAAABB2MBA3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGE3DONBQGA . You are receiving this because you authored the thread.Message ID: @.***>

lixun910 commented 5 months ago

Thanks! Can you try “compression=‘NONE’”?On Jan 14, 2024, at 6:40 PM, cdeverell @.***> wrote: Thanks for looking into this. Attached is the small sample file.

On Mon, 15 Jan 2024 at 14:33, Xun Li @.***> wrote:

Sorry that we don’t have any documentation about this: Kepler.gl only

supports arrow without compression. The default write_table() function

has compression='snappy' so I doubt this might cause the problem to load

the arrow file in Kepler. Can you attach the arrow file? Thanks! On Jan 14,

2024, at 2:29 PM, cdeverell @.***> wrote:

When creating a .arrow file and loading into the kepler.gl demo app, it

appears to read the latitude coordinates as zero.

Tested with a single point in .arrow format. The point shows on the map,

however the latitude isn't being read correctly, and defaults to zero.

I used geoarrow-python / pyarrow to create a simple .arrow file as

follows:

import geoarrow.pyarrow as ga

import pyarrow as pa

construct a geoarrow array

geo_array = ga.as_geoarrow(["POINT (174.753277 -36.85153)"])

additional id column

id_array = pa.array([1], type=pa.int64())

construct table

table = pa.Table.from_arrays([id_array, geo_array], ['id', 'geometry'])

Save the PyArrow Table to an Arrow file

with pa.OSFile('output.arrow', 'wb') as sink:

with pa.RecordBatchFileWriter(sink, table.schema) as writer:

writer.write_table(table)

I noticed that the geometry column is a dictionary, with 'x' and 'y' as

keys. Which seems to differ from the nyc_earnings example file, that is a

list with the lat lon values:

Test file:

image.png (view on web)

NYC demo file:

image.png (view on web)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are

receiving this because you are subscribed to this thread.Message ID:

@.***>

Reply to this email directly, view it on GitHub

https://github.com/keplergl/kepler.gl/issues/2509#issuecomment-1891167400,

or unsubscribe

https://github.com/notifications/unsubscribe-auth/APEK7LK57TAEBZUZEJHMUQDYOSBPPAVCNFSM6AAAAABB2MBA3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGE3DONBQGA

.

You are receiving this because you authored the thread.Message ID:

@.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

cdeverell commented 5 months ago

Thanks for the suggestion. Although I'm not entirely sure on the implementation for that. The pyarrow.parquet write_table function can set the compression to 'none', however this seems to be for writing to parquet files only and doesn't produce the correct file for kepler to read in, even if I set the file type to .arrow. Apologies, I am new to working with this file format, but looks like the performance benefits are amazing for larger file sizes.

On Mon, 15 Jan 2024 at 15:05, Xun Li @.***> wrote:

Thanks! Can you try “compression=‘NONE’”?On Jan 14, 2024, at 6:40 PM, cdeverell @.***> wrote: Thanks for looking into this. Attached is the small sample file.

On Mon, 15 Jan 2024 at 14:33, Xun Li @.***> wrote:

Sorry that we don’t have any documentation about this: Kepler.gl only

supports arrow without compression. The default write_table() function

has compression='snappy' so I doubt this might cause the problem to load

the arrow file in Kepler. Can you attach the arrow file? Thanks! On Jan 14,

2024, at 2:29 PM, cdeverell @.***> wrote:

When creating a .arrow file and loading into the kepler.gl demo app, it

appears to read the latitude coordinates as zero.

Tested with a single point in .arrow format. The point shows on the map,

however the latitude isn't being read correctly, and defaults to zero.

I used geoarrow-python / pyarrow to create a simple .arrow file as

follows:

import geoarrow.pyarrow as ga

import pyarrow as pa

construct a geoarrow array

geo_array = ga.as_geoarrow(["POINT (174.753277 -36.85153)"])

additional id column

id_array = pa.array([1], type=pa.int64())

construct table

table = pa.Table.from_arrays([id_array, geo_array], ['id', 'geometry'])

Save the PyArrow Table to an Arrow file

with pa.OSFile('output.arrow', 'wb') as sink:

with pa.RecordBatchFileWriter(sink, table.schema) as writer:

writer.write_table(table)

I noticed that the geometry column is a dictionary, with 'x' and 'y' as

keys. Which seems to differ from the nyc_earnings example file, that is a

list with the lat lon values:

Test file:

image.png (view on web)

NYC demo file:

image.png (view on web)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are

receiving this because you are subscribed to this thread.Message ID:

@.***>

Reply to this email directly, view it on GitHub

< https://github.com/keplergl/kepler.gl/issues/2509#issuecomment-1891167400>,

or unsubscribe

< https://github.com/notifications/unsubscribe-auth/APEK7LK57TAEBZUZEJHMUQDYOSBPPAVCNFSM6AAAAABB2MBA3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGE3DONBQGA>

.

You are receiving this because you authored the thread.Message ID:

@.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/keplergl/kepler.gl/issues/2509#issuecomment-1891191193, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEK7LPNHHH6CAR2M6YPDI3YOSFFHAVCNFSM6AAAAABB2MBA3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGE4TCMJZGM . You are receiving this because you authored the thread.Message ID: @.***>

lixun910 commented 5 months ago

FYI: One practice that always works for me is using org2ogr to convert any spatial data format to geoarrow that works for Kepler eg: ogr2ogr test.arrow test.geojson -f Arrow -lco COMPRRSSION=NONEOn Jan 14, 2024, at 7:32 PM, cdeverell @.***> wrote: Thanks for the suggestion. Although I'm not entirely sure on the

implementation for that. The pyarrow.parquet write_table function can set

the compression to 'none', however this seems to be for writing to parquet

files only and doesn't produce the correct file for kepler to read in, even

if I set the file type to .arrow.

Apologies, I am new to working with this file format, but looks like the

performance benefits are amazing for larger file sizes.

On Mon, 15 Jan 2024 at 15:05, Xun Li @.***> wrote:

Thanks! Can you try “compression=‘NONE’”?On Jan 14, 2024, at 6:40 PM,

cdeverell @.***> wrote:

Thanks for looking into this. Attached is the small sample file.

On Mon, 15 Jan 2024 at 14:33, Xun Li @.***> wrote:

Sorry that we don’t have any documentation about this: Kepler.gl only

supports arrow without compression. The default write_table() function

has compression='snappy' so I doubt this might cause the problem to load

the arrow file in Kepler. Can you attach the arrow file? Thanks! On Jan

14,

2024, at 2:29 PM, cdeverell @.***> wrote:

When creating a .arrow file and loading into the kepler.gl demo app, it

appears to read the latitude coordinates as zero.

Tested with a single point in .arrow format. The point shows on the map,

however the latitude isn't being read correctly, and defaults to zero.

I used geoarrow-python / pyarrow to create a simple .arrow file as

follows:

import geoarrow.pyarrow as ga

import pyarrow as pa

construct a geoarrow array

geo_array = ga.as_geoarrow(["POINT (174.753277 -36.85153)"])

additional id column

id_array = pa.array([1], type=pa.int64())

construct table

table = pa.Table.from_arrays([id_array, geo_array], ['id', 'geometry'])

Save the PyArrow Table to an Arrow file

with pa.OSFile('output.arrow', 'wb') as sink:

with pa.RecordBatchFileWriter(sink, table.schema) as writer:

writer.write_table(table)

I noticed that the geometry column is a dictionary, with 'x' and 'y' as

keys. Which seems to differ from the nyc_earnings example file, that is

a

list with the lat lon values:

Test file:

image.png (view on web)

NYC demo file:

image.png (view on web)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are

receiving this because you are subscribed to this thread.Message ID:

@.***>

Reply to this email directly, view it on GitHub

<

https://github.com/keplergl/kepler.gl/issues/2509#issuecomment-1891167400>,

or unsubscribe

<

https://github.com/notifications/unsubscribe-auth/APEK7LK57TAEBZUZEJHMUQDYOSBPPAVCNFSM6AAAAABB2MBA3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGE3DONBQGA>

.

You are receiving this because you authored the thread.Message ID:

@.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are

receiving this because you commented.Message ID: @.***>

Reply to this email directly, view it on GitHub

https://github.com/keplergl/kepler.gl/issues/2509#issuecomment-1891191193,

or unsubscribe

https://github.com/notifications/unsubscribe-auth/APEK7LPNHHH6CAR2M6YPDI3YOSFFHAVCNFSM6AAAAABB2MBA3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGE4TCMJZGM

.

You are receiving this because you authored the thread.Message ID:

@.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

cdeverell commented 5 months ago

Thanks for the suggestion. Will check it out.

On Mon, 15 Jan 2024 at 15:56, Xun Li @.***> wrote:

FYI: One practice that always works for me is using org2ogr to convert any spatial data format to geoarrow that works for Kepler eg: ogr2ogr test.arrow test.geojson -f Arrow -lco COMPRRSSION=NONEOn Jan 14, 2024, at 7:32 PM, cdeverell @.***> wrote: Thanks for the suggestion. Although I'm not entirely sure on the

implementation for that. The pyarrow.parquet write_table function can set

the compression to 'none', however this seems to be for writing to parquet

files only and doesn't produce the correct file for kepler to read in, even

if I set the file type to .arrow.

Apologies, I am new to working with this file format, but looks like the

performance benefits are amazing for larger file sizes.

On Mon, 15 Jan 2024 at 15:05, Xun Li @.***> wrote:

Thanks! Can you try “compression=‘NONE’”?On Jan 14, 2024, at 6:40 PM,

cdeverell @.***> wrote:

Thanks for looking into this. Attached is the small sample file.

On Mon, 15 Jan 2024 at 14:33, Xun Li @.***> wrote:

Sorry that we don’t have any documentation about this: Kepler.gl only

supports arrow without compression. The default write_table() function

has compression='snappy' so I doubt this might cause the problem to load

the arrow file in Kepler. Can you attach the arrow file? Thanks! On Jan

14,

2024, at 2:29 PM, cdeverell @.***> wrote:

When creating a .arrow file and loading into the kepler.gl demo app, it

appears to read the latitude coordinates as zero.

Tested with a single point in .arrow format. The point shows on the map,

however the latitude isn't being read correctly, and defaults to zero.

I used geoarrow-python / pyarrow to create a simple .arrow file as

follows:

import geoarrow.pyarrow as ga

import pyarrow as pa

construct a geoarrow array

geo_array = ga.as_geoarrow(["POINT (174.753277 -36.85153)"])

additional id column

id_array = pa.array([1], type=pa.int64())

construct table

table = pa.Table.from_arrays([id_array, geo_array], ['id', 'geometry'])

Save the PyArrow Table to an Arrow file

with pa.OSFile('output.arrow', 'wb') as sink:

with pa.RecordBatchFileWriter(sink, table.schema) as writer:

writer.write_table(table)

I noticed that the geometry column is a dictionary, with 'x' and 'y' as

keys. Which seems to differ from the nyc_earnings example file, that is

a

list with the lat lon values:

Test file:

image.png (view on web)

NYC demo file:

image.png (view on web)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are

receiving this because you are subscribed to this thread.Message ID:

@.***>

Reply to this email directly, view it on GitHub

<

https://github.com/keplergl/kepler.gl/issues/2509#issuecomment-1891167400>,

or unsubscribe

<

https://github.com/notifications/unsubscribe-auth/APEK7LK57TAEBZUZEJHMUQDYOSBPPAVCNFSM6AAAAABB2MBA3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGE3DONBQGA>

.

You are receiving this because you authored the thread.Message ID:

@.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are

receiving this because you commented.Message ID: @.***>

Reply to this email directly, view it on GitHub

< https://github.com/keplergl/kepler.gl/issues/2509#issuecomment-1891191193>,

or unsubscribe

< https://github.com/notifications/unsubscribe-auth/APEK7LPNHHH6CAR2M6YPDI3YOSFFHAVCNFSM6AAAAABB2MBA3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGE4TCMJZGM>

.

You are receiving this because you authored the thread.Message ID:

@.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/keplergl/kepler.gl/issues/2509#issuecomment-1891226411, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEK7LLUI4YHRPRRZISTDJTYOSLFTAVCNFSM6AAAAABB2MBA3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGIZDMNBRGE . You are receiving this because you authored the thread.Message ID: @.***>