koop-retired / koop-server

PROJECT DEPRECATED - DO NOT USE
Apache License 2.0
5 stars 7 forks source link

geojson downloads from same dataset observed to be in different orders, but should be in same order #19

Closed YmerejRedienhcs closed 9 years ago

YmerejRedienhcs commented 10 years ago

geojson downloads from same dataset observed to be in different orders, but should be in same order

There is a dataset on production, http://qadownloadtest.dcdev.opendata.arcgis.com/datasets/78be7bb0306448048cdb51468b439fae_0

I downloaded it 6 times yesterday with the URL http://qadownloadtesting.dcdev.opendata.arcgis.com/datasets/78be7bb0306448048cdb51468b439fae_0.geojson. One download was incomplete (truncated), which I am investigating. Of the complete downloads, they were not all identical. They had the same features in them (the md5s of the sorted files were the same), but they were not in the same order.

Here are the beginnings of a couple files along with sort | md5 which together show that the order is different but the content is the same:

JSchneiderMbpR2:repeatrun jere7054$ head ./20140820182444/prod/78be7bb0306448048cdb51468b439fae/78be7bb0306448048cdb51468b439fae.geojson
{
"type": "FeatureCollection",

"features": [
{ "type": "Feature", "properties": { "FID": 1001, "GIS_ID": 799326, "CREATION_D": "2006-09-09T00:00:00.000Z", "RECORDATIO": null, "NARRATIVE": "DELIVERY 7", "EXPIRATION": null, "STATUS": 1, "SQUARE": "1974", "SUFFIX": " ", "LOT": "0008", "LOT_TYPE": 1, "COMPUTED_A": 6426.0, "SURVEYED_A": 0, "ADDRESS_ID": 0, "IS_REAR_OF": 0, "BLDG_NUM": 0, "BLDG_NUM_E": " ", "BLDG_NUM_1": 0, "BLDG_EXT_E": " ", "QUADRANT": " ", "ZIP5": " ", "ZIP4": " ", "RECORD_ARE": 6180.0, "CONV_TOLER": 1, "CONV_TOL_1": 0, "CONV_TOL_2": " ", "RECORDLOTS": 0, "PARCELSPLY": 0, "SQUAREPLYI": 0, "OF_LOT_SEQ": 0, "BOOK_NUM": "123", "PAGE_NUM": "119", "UNDERLIES_": 0, "CONDO_BOOK": " ", "CONDO_PAGE": " ", "KILL_DT": null, "SSL": "1974    0008", "COL": "Y", "CONDO_REGI": " ", "CONDOLOT": "N", "SHAPE_AREA": 596.996687932999976, "SHAPE_LEN": 101.865050739 }, "geometry": { "type": "Point", "coordinates": [ -77.071967105029458, 38.948308550352714 ] } }
,
{ "type": "Feature", "properties": { "FID": 1002, "GIS_ID": 799326, "CREATION_D": "2006-09-09T00:00:00.000Z", "RECORDATIO": null, "NARRATIVE": "DELIVERY 7", "EXPIRATION": null, "STATUS": 1, "SQUARE": "1974", "SUFFIX": " ", "LOT": "0008", "LOT_TYPE": 1, "COMPUTED_A": 6426.0, "SURVEYED_A": 0, "ADDRESS_ID": 0, "IS_REAR_OF": 0, "BLDG_NUM": 0, "BLDG_NUM_E": " ", "BLDG_NUM_1": 0, "BLDG_EXT_E": " ", "QUADRANT": " ", "ZIP5": " ", "ZIP4": " ", "RECORD_ARE": 6180.0, "CONV_TOLER": 1, "CONV_TOL_1": 0, "CONV_TOL_2": " ", "RECORDLOTS": 0, "PARCELSPLY": 0, "SQUAREPLYI": 0, "OF_LOT_SEQ": 0, "BOOK_NUM": "123", "PAGE_NUM": "119", "UNDERLIES_": 0, "CONDO_BOOK": " ", "CONDO_PAGE": " ", "KILL_DT": null, "SSL": "1974    0008", "COL": "Y", "CONDO_REGI": " ", "CONDOLOT": "N", "SHAPE_AREA": 596.996687932999976, "SHAPE_LEN": 101.865050739 }, "geometry": { "type": "Point", "coordinates": [ -77.071966989294722, 38.948015035686744 ] } }
,
{ "type": "Feature", "properties": { "FID": 1003, "GIS_ID": 799326, "CREATION_D": "2006-09-09T00:00:00.000Z", "RECORDATIO": null, "NARRATIVE": "DELIVERY 7", "EXPIRATION": null, "STATUS": 1, "SQUARE": "1974", "SUFFIX": " ", "LOT": "0008", "LOT_TYPE": 1, "COMPUTED_A": 6426.0, "SURVEYED_A": 0, "ADDRESS_ID": 0, "IS_REAR_OF": 0, "BLDG_NUM": 0, "BLDG_NUM_E": " ", "BLDG_NUM_1": 0, "BLDG_EXT_E": " ", "QUADRANT": " ", "ZIP5": " ", "ZIP4": " ", "RECORD_ARE": 6180.0, "CONV_TOLER": 1, "CONV_TOL_1": 0, "CONV_TOL_2": " ", "RECORDLOTS": 0, "PARCELSPLY": 0, "SQUAREPLYI": 0, "OF_LOT_SEQ": 0, "BOOK_NUM": "123", "PAGE_NUM": "119", "UNDERLIES_": 0, "CONDO_BOOK": " ", "CONDO_PAGE": " ", "KILL_DT": null, "SSL": "1974    0008", "COL": "Y", "CONDO_REGI": " ", "CONDOLOT": "N", "SHAPE_AREA": 596.996687932999976, "SHAPE_LEN": 101.865050739 }, "geometry": { "type": "Point", "coordinates": [ -77.07217795935334, 38.948015001480911 ] } }
,
JSchneiderMbpR2:repeatrun jere7054$ sort ./20140820182444/prod/78be7bb0306448048cdb51468b439fae/78be7bb0306448048cdb51468b439fae.geojson | md5
143ca24e104709b6ddf5e2229707ff8f
JSchneiderMbpR2:repeatrun jere7054$ head ./20140820182923/prod/78be7bb0306448048cdb51468b439fae/78be7bb0306448048cdb51468b439fae.geojson
{
"type": "FeatureCollection",

"features": [
{ "type": "Feature", "properties": { "FID": 3001, "GIS_ID": 893808, "CREATION_D": "2006-09-09T00:00:00.000Z", "RECORDATIO": null, "NARRATIVE": "DELIVERY 8", "EXPIRATION": null, "STATUS": 1, "SQUARE": "2010", "SUFFIX": " ", "LOT": "0055", "LOT_TYPE": 1, "COMPUTED_A": 4743.0, "SURVEYED_A": 0, "ADDRESS_ID": 0, "IS_REAR_OF": 0, "BLDG_NUM": 0, "BLDG_NUM_E": " ", "BLDG_NUM_1": 0, "BLDG_EXT_E": " ", "QUADRANT": " ", "ZIP5": " ", "ZIP4": " ", "RECORD_ARE": 4743.0, "CONV_TOLER": 1, "CONV_TOL_1": 0, "CONV_TOL_2": " ", "RECORDLOTS": 0, "PARCELSPLY": 0, "SQUAREPLYI": 0, "OF_LOT_SEQ": 0, "BOOK_NUM": "SUB", "PAGE_NUM": "86", "UNDERLIES_": 0, "CONDO_BOOK": " ", "CONDO_PAGE": " ", "KILL_DT": null, "SSL": "2010    0055", "COL": "Y", "CONDO_REGI": " ", "CONDOLOT": "N", "SHAPE_AREA": 440.64034512500001, "SHAPE_LEN": 87.782556353399997 }, "geometry": { "type": "Point", "coordinates": [ -77.069499067824609, 38.96946570071956 ] } }
,
{ "type": "Feature", "properties": { "FID": 3002, "GIS_ID": 843439, "CREATION_D": "2006-09-09T00:00:00.000Z", "RECORDATIO": null, "NARRATIVE": "DELIVERY 5", "EXPIRATION": null, "STATUS": 1, "SQUARE": "0962", "SUFFIX": " ", "LOT": "0033", "LOT_TYPE": 1, "COMPUTED_A": 1715.0, "SURVEYED_A": 0, "ADDRESS_ID": 0, "IS_REAR_OF": 0, "BLDG_NUM": 0, "BLDG_NUM_E": " ", "BLDG_NUM_1": 0, "BLDG_EXT_E": " ", "QUADRANT": " ", "ZIP5": " ", "ZIP4": " ", "RECORD_ARE": 0.0, "CONV_TOLER": 1, "CONV_TOL_1": 0, "CONV_TOL_2": " ", "RECORDLOTS": 0, "PARCELSPLY": 0, "SQUAREPLYI": 0, "OF_LOT_SEQ": 0, "BOOK_NUM": "18", "PAGE_NUM": "161", "UNDERLIES_": 0, "CONDO_BOOK": " ", "CONDO_PAGE": " ", "KILL_DT": null, "SSL": "0962    0033", "COL": "Y", "CONDO_REGI": " ", "CONDOLOT": "N", "SHAPE_AREA": 159.310086424, "SHAPE_LEN": 71.853718261599994 }, "geometry": { "type": "Point", "coordinates": [ -76.992136621380581, 38.895645626564438 ] } }
,
{ "type": "Feature", "properties": { "FID": 3003, "GIS_ID": 893808, "CREATION_D": "2006-09-09T00:00:00.000Z", "RECORDATIO": null, "NARRATIVE": "DELIVERY 8", "EXPIRATION": null, "STATUS": 1, "SQUARE": "2010", "SUFFIX": " ", "LOT": "0055", "LOT_TYPE": 1, "COMPUTED_A": 4743.0, "SURVEYED_A": 0, "ADDRESS_ID": 0, "IS_REAR_OF": 0, "BLDG_NUM": 0, "BLDG_NUM_E": " ", "BLDG_NUM_1": 0, "BLDG_EXT_E": " ", "QUADRANT": " ", "ZIP5": " ", "ZIP4": " ", "RECORD_ARE": 4743.0, "CONV_TOLER": 1, "CONV_TOL_1": 0, "CONV_TOL_2": " ", "RECORDLOTS": 0, "PARCELSPLY": 0, "SQUAREPLYI": 0, "OF_LOT_SEQ": 0, "BOOK_NUM": "SUB", "PAGE_NUM": "86", "UNDERLIES_": 0, "CONDO_BOOK": " ", "CONDO_PAGE": " ", "KILL_DT": null, "SSL": "2010    0055", "COL": "Y", "CONDO_REGI": " ", "CONDOLOT": "N", "SHAPE_AREA": 440.64034512500001, "SHAPE_LEN": 87.782556353399997 }, "geometry": { "type": "Point", "coordinates": [ -77.069439929590558, 38.969333498140081 ] } }
,
JSchneiderMbpR2:repeatrun jere7054$ sort ./20140820182923/prod/78be7bb0306448048cdb51468b439fae/78be7bb0306448048cdb51468b439fae.geojson | md5
143ca24e104709b6ddf5e2229707ff8f
JSchneiderMbpR2:repeatrun jere7054$
YmerejRedienhcs commented 10 years ago

relevant IRC conversation:

jeremy 10:38 chelm: my observation is that the geojson download does not get the features downloaded in the same order each time. This makes the md5 different… would it be very hard to make koop generate them in the FID order the way it does for CSV? chelm 10:39 this surprises me 10:39 i shall look into your problem jeremy 10:39 thanks! 10:52 jeremy: whats the dataset you're working with chelm 10:59 actually jeremy maybe it'd be best to create a ticket for the geojson ordering chelm 11:00 jeremy: locally the geojson is rendering in order everytime, so let's create ticket for what you're seeing

...

jeremy 12:47 chelm: does koop get it's pages/groups of say, 1000 features synchronously or asynchronously? chelm 12:47 ahh yes 12:47 that 12:47 async 12:47 i knew you'd notice that jeremy 12:47 mmm hmm.. chelm 12:47 thats probably not good jeremy 12:48 yeah, I think that's why I'm seeing different orders. chelm 12:48 im curious wether file size might be a better indicator that md5 jeremy 12:49 it would not be a better indicator of whether I'm getting the same content. 12:49 "abc" has the same length as "def". jeremy 12:50 I guess it depends on your definition of "better".

chelm commented 10 years ago

There may be no way to ensure order on pages of data since they're async but I'll look