Open dylrich opened 5 years ago
My typical usage with streaming input is
echo '{"type":"Polygon","coordinates":[[[0,0],[0,1],[1,1],[1,0],[0,0]]]}' | tippecanoe -zg -e test
or, if I want a different layer name,
echo '{"type":"Polygon","coordinates":[[[0,0],[0,1],[1,1],[1,0],[0,0]]]}' | tippecanoe -zg -l layer -e test
There is currently no support for streaming geobuf, because protozero
's interface takes a string or a memory buffer, not a stream, so it can expose submessages as memory slices.
The error message you are seeing is because if the attempt to memory-map the input fails, the regular JSON parser gets a chance to run on the input. This makes sense for memory-mapped JSON files vs JSON streams, but not for geobuf, so I should fix that.
It would certainly be possible to do streaming output of GeoJSON or something similar, replacing the call to mbtiles_write_tile
/dir_write_tile
in tile.cpp
with something that writes to a stream instead. What sort of format would be useful here?
Thanks for the response! I am not sure what I did wrong originally with streaming in geojson, I'm sure I made some typo and "fixed" it when I tried with the layer json. Will use your method.
Good to know on the geobuf - not a deal breaker at all for me and GeoJSON is of course usable.
For streaming output I basically just want to pipe the pbf tippecanoe produces, along with its XYZ, into a different program which will directly work with the data there and finally write the output to my own database instead. So a JSON with x, y, z, and pbf would work for me. Not sure how you'd handle the tileset metadata, though.
Maybe it would be reasonable to write out the tiles in tar
format, since that is meant to be a stream? The metadata would be in metadata.json
, just like if you are writing tiles to a directory instead of to an mbtiles file.
tar
would certainly work for my purposes!
@ericfischer Quick follow-up on this:
I'm trying to use AWS Lambda to, in realtime, update a tile directory on S3 (related: #776, #741) from a Postgres database. I've created an AWS Lambda Layer (arn:aws:lambda:us-east-1:003014164496:layer:Mapbox_Tippecanoe-1_34_3:9
, pull request coming shortly to add to README) and have been able to successfully run Tippecanoe by using Lambda's/tmp
folder. I then read the temporary mvt file and push to S3.
I can foresee this approach being problematic for very large files. What I'd prefer is if there was a way to output each tile as it is generated to stdout
that I could manually push to S3. Would this be feasible? Even if the output was the entire directory, I could recursively push each file to S3 but I can't think of another way to skip the /tmp
folder step
Example of current (not ideal) process:
var path = require('path');
var exec = require('child_process').exec;
var aws = require('aws-sdk');
var s3 = new aws.S3();
var fs = require('fs');
exports.handler = function(event, context, callback) {
var exePath = path.resolve(__dirname, '');
var content;
function processFile() {
var params = {
Body: content,
Bucket: <BUCKETNAME>,
Key: 'out.mbtiles'
};
s3.putObject(params, function(err, data) {
if (err) console.log(err, err.stack);
else console.log(data);
});
}
exec(
`/opt/bin/tippecanoe -o /tmp/out.mbtiles -zg --drop-densest-as-needed ./input.geojson`,
{ env: environment, cwd: exePath },
(error, stdout, stderr) => {
if (error) {
callback(error);
}
fs.readFile('/tmp/out.mbtiles', function read(err, data) {
if (err) {
throw err;
}
content = data;
processFile();
});
callback(null, stdout);
}
);
};
If you just want to write each tile to stdout in a way that you can read back in from something further down the pipeline in a streaming way, the easiest thing would be to replace mbtiles_write_tile
in mbtiles.cpp
with something like this:
void mbtiles_write_tile(sqlite3 *outdb, int z, int tx, int ty, const char *data, int size) {
printf("%d %d %d ", z, tx, ty);
for (int i = 0; i < size; i++) {
printf("%02x", (unsigned char) data[i]);
}
printf("\n");
}
For each tile it will write out a line with the zoom, x, and y coordinates, and the hex-encoded content of the tile, which you can then decode back into the tile data in your reader.
Writing a tar
file would be quite similar, except that the tar
format contains a checksum, and I haven't looked up how to calculate the checksum.
Awesome! I'll give it a go. If i can come up with a solid solution for a fully Lambda-hosted solution, I'll make a note for future users who are hoping to go Serverless with Tippecanoe.
@ericfischer OK, so I think I've got just about everything working except I'm trying to figure out the best way to send the hex-encoded data to S3 to create a PBF file. I'm using NodeJS's child_process.spawn()
with Tippecanoe and piping the output data. I then take the output string, break by newline and break by space. Three questions:
'hexCode\n z x y hexCode'
. What is the first hex code? Something to do with metadata?Here's an example output array (from question 2, above, and split by \n
for readability):
['c3366249d3da0ab6d45a376457459a633b5f6deb11526f853a78629d168558f35c0a96d4dfad54b399bcded99e0ed1c00fa3eb2f21f9345f2c48b8f4a7942272b70c665312ce6ee6d35910cdfdc507122c23f2d90faefd28f203ea22e7ab3ff183883ac89e4efc1b6a935f0f26358951df11b5c8df0eb550e7b6a6871f149e9eb7e7f123f510ba20179757efc89bf3338a903b09fdbbdbf1783c8c4eace1a561743aa66959b60d8003bac0051020d0031e78e63ceff6dd017c0187e025188123740c5ef54ebc538081fbfbf12d1dc1c1bfee7bcbeb1947b6d13f36916377fbff0198c58e182a020000',
'12 1204 1542 1f8b08000000000002036590cd6ed34010c7edd8d9d81ba74ed2a64db780b62b84ca01413f4142a8588da111c601272d6a0f584ebc692cb976646f808a0b275ea02790403c011f27ae1ce105fa027d05eeb0767a4062a4917e33daf9cf7f167d01af0a506634652b6f30d2d28051d7f3fd84a629d2d9c998ba9394ba3e1dc43e45ca118ddcac898af1cb8826480963e67a09f5101c518f513f6775101f1f4fdb2a170aa6588e87c360402f8a84322f08a785124d8edd7ee81f2198d1308ce324450a8bd9240a588ad46ccb30892336459f8ed908a94192d0a349e86536bcc8775f7821d2f85006ae97a63e8227d44bdcfe2408d9057b21e3b6d5bce50eb3d1dc604eb9eb8ce4c93888901cf5475c2266239a4c7d22168cc774e045317587fcdc4992fd4cc4ffec842cc0c6eaeae606369cdeee9e831fb52d0b3b1da345aec1ab1ddbc4dd5ec739c037f013d3796cd8a6ddc3567bbf6d3fc44ff7f888e9903254bbbcb6cc07c663a2c38a6959ed2e365bcf0ca7d525eae29f2c6e7ebb4f24fc1b900216788aa4d810841fdb1c5778464482a2cd1f7cbc4764fcf5b3c8f1bbce67dfbfcbe2d736c7f53cceb78906e126dedcba731bafae6d1008951dc7383c585e5e267558fdef0a02f1213fa3851db3db6e71fb6dc3ba8bed4e0ff36b768d5ecfb08902c1beb163d83d02a0dcda31badcc4dbd3023736c78dfc143377079c4e5fffbb7cfd56dd6a4af52d4110c542419264b95804a05454800a60a9ac68a58a3a03f57255ab55ea33b3604e6f54e7c142ad595f0408281fced788a68a673a1241591244bdfe3c579b15e70a0d695e5ee062a004723550061aa834671675545daa2dd5015703b9daa50bb54f5ced8a4acef4beda2c0a825493af03e17253980182280bf31ad0f5bfe11677c732030000' ]
It would be amazing if we could get a new potential option, similar to -o and -e, that would stream the tiles so that they could easily be sent to a remote server. Something like --output-to-stream
It's not intentional that there are two hex strings in the line, so maybe there is some multithreaded locking problem going on. The format of each line is intended to just be
zoom x y hexstring
Do the hex strings decode to valid tiles for you?
The use of sqlite is just to package the tiles into mbtiles format. If you are not using mbtiles, you could use the aws s3 cp
command or something like that to copy the file for the tile to s3.
Tile-join also calls mbtiles_write_tile
, so this change will make it also write to the standard output instead of creating a tileset.
I agree that an --output-to-stream
option would be the right way to do this instead of editing the code. If this issue results in a generally-useful output format that many programs will be able to take advantage of, I will turn it into a real option.
It's not intentional that there are two hex strings in the line, so maybe there is some multithreaded locking problem going on
It looks like every time I receive 2 additional hex codes (unrelated to the actual tiles). Example:
edffb28c000000
6462626666615162e5ec63b9600b1566050ab331b3b3700085cf696ce00000f3a6293da9000000
Do the hex strings decode to valid tiles for you?
Seems to be working. I ended up converting the hex code to a buffer using Javascript's Buffer.from()
command.
Is there an easy way to also run the metadata?
I agree that an --output-to-stream option would be the right way to do this instead of editing the code. If this issue results in a generally-useful output format that many programs will be able to take advantage of, I will turn it into a real option.
It seems to me that this would be a nice agnostic way to push to any sort of database (see #87, #751, #741, et al). Especially as the world moves increasingly towards serverless, I think that this will be an invaluable option for most.
In my mind, the best format would be to output each tile as you have above (z x y hex; alternatively, hex could be a buffer) and also output the metadata. My guess is that this should be sufficient for most.
For reference, here's an example of my finalized code for creating tiles in S3 using Lambda from a separate event trigger for S3. The function also uses a Lambda Layer arn:aws:lambda:us-east-1:003014164496:layer:Mapbox_Tippecanoe-1_34_3:9
generated from the modified code above. Note that this is fully streaming on both input and output:
exports.handler = function(event, context, callback) {
var srcBucket = <BUCKETNAME>
var srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
function processFile(buf, zxy) {
const prefix = 'main/test';
const filePath = `${path.join(prefix, ...zxy)}.pbf`;
const content = Buffer.from(buf, 'hex');
var params = {
Body: content,
Bucket: <BUCKETNAME>,
Key: filePath,
ContentEncoding: 'gzip'
};
s3.putObject(params, function(err, data) {
if (err) console.log(err, err.stack);
else console.log(data);
});
}
const tippecanoe = '/opt/bin/tippecanoe';
const tippArgs = '-f -z14 -l test -o /tmp/test.mbtiles'.split(' ');
const tipp = spawn(tippecanoe, tippArgs, { shell: true });
tipp.stdout.on('data', data => {
var arr = data.toString().split('\n');
arr.forEach(string => {
var output = string.split(' ');
if (output.length > 1) {
const buf = output.pop();
processFile(buf, output);
}
});
});
tipp.stderr.on('data', data => {
// pass; *NOTE: this is where the progress prints out,
// so if you want to capture that in a Cloudwatch log,
// you should throw a console.log here
});
tipp.on('close', code => {
if (code !== 0) {
console.log(`tipp process exited with code ${code}`);
}
});
s3.getObject({
Bucket: srcBucket,
Key: srcKey
})
.createReadStream()
.pipe(tipp.stdin);
};
One more specific use case where --output-to-stream
would be useful:
If I try to recreate this on AWS Lambda, I have to choose whether to use the original output to .mbtiles or the updated code above for streaming.
My ideal workflow would be:
tile-join
. Output to tmp
folder using -o
tippecanoe
. Output to tmp
folder using -o
tile-join
. Output to stream and push to S3@ericfischer Quick question: re solution above:
How do I generate metadata? Based on my ignorant glance at the code, it looks like the metadata creation step requires a sqlite database to be created (which the above code skips). Is that correct? Any tips on a succinct way to stream the output while also creating a metadata file?
It really should have been somewhere in dirtiles.cpp
, but there is code to write the metadata as JSON in mbtiles.cpp
:
https://github.com/mapbox/tippecanoe/blob/d96b521570dd9af522349d753368e0faecbb4243/mbtiles.cpp#L493
In this situation it writes the metadata into a temporary sqlite database, allocated here:
https://github.com/mapbox/tippecanoe/blob/d96b521570dd9af522349d753368e0faecbb4243/mbtiles.cpp#L277
and then copies the metadata from that table into the metadata.json
file.
Gotcha. So in order to output the metadata to a stream, I'll need to get rid of if (outdir != NULL)
and add an echo? Sorry again for the basic question--not really up to speed on C++.
EDIT: I suppose I could also just output to the Lambda tmp folder and then copy from there as well.
Sorry it's not clearer, but yes, getting rid of the if (outdir != NULL)
and then changing the code inside to write to wherever you actually want the metadata instead to the fp
that is being opened to metadata.json
will be the way to do it.
Were you able to get straightened out whatever was causing the extra hex codes?
Were you able to get straightened out whatever was causing the extra hex codes?
I wasn't but it doesn't seem to be a problem. I was able to get my serverless workflow working (although a bit hack-y but doesn't seem to be a problem so far) which was the main goal.
Thanks for the help. I'll see if I can figure out how to stream the metadata.
Great, I hope it works!
I think I'm going to go ahead and put a branch that writes to tar
format (including the metadata), since that seems like it might be a useful generalization even if it's not quite what you want.
https://github.com/mapbox/tippecanoe/pull/789 adds the --output-to-tar
option, which I hope will also provide a good example of how to handle other streaming output types.
Were you able to get straightened out whatever was causing the extra hex codes?
I wasn't but it doesn't seem to be a problem. I was able to get my serverless workflow working (although a bit hack-y but doesn't seem to be a problem so far) which was the main goal.
Thanks for the help. I'll see if I can figure out how to stream the metadata.
@stdmn
I wanted to try your Tippecanoe lambda layer, according to your readme.
I am receiving You are not authorized to perform: lambda:GetLayerVersion on resource: arn:aws:lambda:us-east-1:003014164496:layer:Mapbox_Tippecanoe-1_34_3:3
error.
I think you need to add this permissions to your lambda layers..
And is this the latest stable version? I have seen different versions on top of this topic.
Try the new version: arn:aws:lambda:us-east-1:003014164496:layer:Mapbox_Tippecanoe-1_34_3:9
Quick note: I've done a frankenstein and created a version that includes two new functions, tippecanoe-stream and tile-join-stream, that stream the output instead of writing to files. You can still use tippecanoe and tile-join if you want to write to files.
Quick note 2: This layer works on
Unfortunately that version didn't work as well but I managed to run it myself. Probably you need to assign additional policies that will make it available publicly. Anyway thank you for your response.
In case anyone else is interested I made a Dockerfile setup for creating a Tippecanoe lambda layer here: https://github.com/kylebarron/tippecanoe-lambda. I also published the layer to a few U.S. regions and I think made it public.
Hi! Having some trouble setting up tippecanoe to do tiling without actually writing to file and I would love some assistance.
I was able to get geojson to stream with the following command:
echo '{"type":"Polygon","coordinates":[[[0,0],[0,1],[1,1],[1,0],[0,0]]]}' | tippecanoe -L'{"file":"", "layer":"test", "description":"test"}' -e test
Which didn't feel quite right, as I am basically passing in an almost empty layer json just to get it to stream. What is the intended method to accomplish this with geojson? Related to that, how would I pipe in a geobuf with the same method? When I try inserting an example base64 encoded geobuf, e.g.
CgRuYW1lGAAiHQobCgwIBBoIAAAAAgIAAAFqBwoFdGVzdDFyAgAA
into stdin then I get an error about an unexpected character, leading me to believe that it is expecting geojson. Would love some guidance on this.I also am wondering if it is possible to stream the output to stdout in some sort of structured way instead of writing to files/.mbtiles. From the documentation it looks like this may be a lot more complicated/not feasible for me to write, but if there are any known methods to do this that would be most appreciated.
Thanks!