protobufjs / protobuf.js

Protocol Buffers for JavaScript & TypeScript.
Other
9.83k stars 1.41k forks source link

"Capacity problem" - or problem in Protobuf logic #26

Closed mbflex closed 11 years ago

mbflex commented 11 years ago

Running ProtoBuf.js in Real Life with real data results sometimes in an error message:

Cannot read uint8 from ByteBuffer(offset=642,markedOffset=-1,length=644,capacity =644): Capacity overflow.

The problem clearly depends on the data which are handled. Anyhow, the reason and how to solve it is open, unfortunately.

mbflex commented 11 years ago

The underlying buffer seems to be resized on the fly when data are written into the ByteBuffer. Because this works fine, there should no real capacity problem at all

The problem is more that ProtoBuf try to read stuff which was never written into the Buffer. Seems to be a problem in the underlying logic in ProtoBuf?

dcodeIO commented 11 years ago

Unsure what's going on there as offset=642 with length/capacity=644 should not throw an exception (still 2 uint8s left). Are you using the latest version of ByteBuffer.js? Edit: Have you tried BB 1.3.6?

dcodeIO commented 11 years ago

I've added a bit more information to the exception message in BB 1.3.6 (now on NPM). It now also names the actual offset that's being accessed.

mbflex commented 11 years ago

I updated to latest version (just now). Full error stack is now: Error: Cannot read uint8 from ByteBuffer(offset=194,markedOffset=-1,length=644,c apacity=644) at 644: Capacity overflow at ByteBuffer.readUint8 (..\protobufjs\node_modules\bytebuffer\ByteBuffer.js:607:23) at Function.ByteBuffer.decodeUTF8Char (..\protobufjs\node_modules\bytebuffer\ByteBuffer.js:1423:25) at ByteBuffer.readUTF8StringBytes (..\protobufjs\node_modules\bytebuffer\ByteBuffer.js:1633:34) at ByteBuffer.readVString (..\protobufjs\node_modules\bytebuffer\ByteBuffer.js:1727:28) at ProtoBuf.Reflect.Field.decode (..\protobufjs\ProtoBuf.js:2026:35) at ProtoBuf.Reflect.Message.decode (..\protobufjs\ProtoBuf.js:1611:51) at ProtoBuf.Reflect.Field.decode (..\protobufjs\ProtoBuf.js:2041:46) at ProtoBuf.Reflect.Message.decode (..\protobufjs\ProtoBuf.js:1609:51) at ProtoBuf.Reflect.Field.decode (..\protobufjs\ProtoBuf.js:2041:46) at ProtoBuf.Reflect.Message.decode (..\protobufjs\ProtoBuf.js:1611:51)

mbflex commented 11 years ago

I am using Protobufjs in a loop, like "all each 5 seconds: read GTFS (ProtoBuf) File from WebServer, decode it ... read again .."

Typically: it works for a while (or even just for the very first one), and than it crashes during an other run. The problem seems to be the loop, which seems to be not "allowed".

What is the problem with the loop?

mbflex commented 11 years ago

Finally, I found that the problem is not the decoding process but the HTTP request - after a while, the remote server starts to use "Chunked transfer encoding" - which was not handled correctly in my code. So, protobufjs works fine but needs a valid data input ;-)

dcodeIO commented 11 years ago

Maybe I could add some sort of #decodeFromUrl(...) or something. Would you share your code?

mbflex commented 11 years ago

I am now using this code (but I am still not sure whether this 100% correct or not):

var ProtoBuf = require("protobufjs"); var http = require('http'); var configServer = {host: '.........', port:81, path:"/gtfs", method:'GET'}; var configFrequencySeconds = 3; var myDecoder = ProtoBuf.protoFromFile("conf/gtfs-realtime.proto").build("transit_realtime").FeedMessage; readGtfs();

function readGtfs() { var data = 0; var req = http.request(configServer, function(res) { res.on('data', function (chunk) { if (!data) data = chunk;else data += chunk; }); res.on('end', function (){ var feed = myDecoder.decode(data); ... < work with decoded data > }); }); req.on('error', function(e) {console.log('ERROR: problem with request: ' + e.message);}); req.end(); setTimeout(readGtfs, configFrequencySeconds*1000); }

saccodd commented 10 years ago

Hi, I am working on a Node.js solution to retrieve and process GTFS-realtime. Currently my code is based on yours and works well on Trimet feeds and Bart feeds. However it doesn't on others that I tested. Specifically, I would focus on VehiclePosition feeds provided by MBTA and I cannot (http://developer.mbta.com/lib/gtrtfs/Vehicles.pb). There is a kind of mismatch between the proto file and the feed provided by them. One field is always missing, typically the header. I have tried to request the data every 3 seconds, but the issue keeps being not solved: I never retrieve a complete feed. This problem drives me crazy! May be I miss something in my code? or may be their feed is corrupted? or something goes wrong in ProtoBuf.js implementation? Any help from you is really appreciated! I would like to hear some experiences from you. Thank you very much. Daniele

dcodeIO commented 10 years ago

One thing you could try is to reverse engineer the Vehicles.pb to validate that it actually matches the proto definition:

That'd be the obvious reason for failure.

If you assume a bug in ProtoBuf.js, any additional information would be useful, like errors thrown or a break down of the data and proto file to a minimal failing case.

dcodeIO commented 10 years ago

Another point of error could be that somewhere between requesting and parsing the Vehicles.pb there is a string conversion, which is bad. This will most likely corrupt the data. See:

Something like this is also required on node, like working with buffers instead of strings when fetching the data, maybe forcing Content-Type: application/octet-stream. To validate this case, download the Vehicles.pb to your hard drive and load it through fs.readFileSync, then try to YourMessage#decode it without any additional conversion. If this works and the remotely fetched does not, something is wrong in between.

saccodd commented 10 years ago

The only conversion I have is from binary to ascii because ProtoBuf needs a valid base64 encoded string. By the way, here you have my code:

var http = require('http');
var configServer = {host: 'developer.mbta.com', path: '/lib/gtrtfs/Vehicles.pb'};

var configFrequencySeconds = 3; 
var myDecoder = ProtoBuf.protoFromFile(path.join(__dirname, "www", "gtfs-realtime.proto")).build("transit_realtime").FeedMessage;
readGtfs();

function readGtfs() {
var data = 0;
var req = http.request(configServer, function(res) {
    //res.responseType="arraybuffer";
    res.on('data', function (chunk) {
        if (!data) data = chunk; else data += chunk; 
    });
    res.on('end', function (){ 
        database64 = btoa(data);
        try {
            var feed = myDecoder.decode(database64);
            console.log("Test 1.1 " + JSON.stringify(feed, null, 4));
            console.log("Test 1.2 " + feed.entity[1].id);
        } catch (e) {
            if (e.decoded) { // Truncated
                feed = e.decoded; // Decoded message with missing required fields
                console.log("Test 2.1 " + JSON.stringify(feed, null, 4));
                console.log("Test 2.2 " + feed);
            } else { // General error
                console.log(e);
            }
        }

    });
});
req.on('error', function(e) {console.log('ERROR: problem with request: ' + e.message);});
req.end();
setTimeout(readGtfs, configFrequencySeconds*1000);
}

I will try your test cases and I will let you know.

dcodeIO commented 10 years ago

The problem is this:

if (!data) data = chunk; else data += chunk; 

This converts data to a string from a Buffer object. Never ever do this if the Buffer's data is not an utf8 encoded string! Instead try:

function readGtfs() {
    var data = []; // List of buffers
    var req = http.request(configServer, function(res) {
        res.on('data', function (chunk) {
                data.push(chunk); // Add buffer chunk
        });
        res.on('end', function (){ 
            data = Buffer.concat(data); // Make one large buffer of it
            try {
                var feed = myDecoder.decode(data); // And decode it
...

This way, no string conversion happens and YourMessage#decode should work (no need for base64).

I've also updated the FAQ: https://github.com/dcodeIO/ProtoBuf.js/wiki/How-to-read-binary-data-in-the-browser-or-under-node.js%3F

saccodd commented 10 years ago

It works perfectly!!! Your ProtoBuf.js is awesome! Thank you very much for your support also. If my project evolves well I will let you know. Daniele

dcodeIO commented 10 years ago

You are welcome!