hobuinc / laz-perf

Alternative LAZ implementation for C++ and JavaScript
Apache License 2.0
76 stars 45 forks source link

Corrupted LAZ output with certain classification sequence #22

Closed vuakko closed 8 years ago

vuakko commented 8 years ago

Hi,

I've extracted a specific sequence of classification values for 500 points, which results in a corrupt file being created. I haven't yet been able to reduce the test case further. The specific classification sequence is attached and the test code that produces the issue is below.

You can verify the corruption by dumping the points in the resulting testout.laz file. Also LAStools utils will notify about corruption.

The corruption occurs with GCC 5 on both Linux and Windows.

#include <iterator>
#include <vector>

#include "laz-perf/io.hpp"

namespace lz = laszip;
namespace fmt = laszip::formats;
namespace io = laszip::io;

// Run me:  <binary> <file with white-space delimited classification values as text>
int main(int argc, char** argv) {
  std::ifstream ifs(argv[1]);
  std::istream_iterator<int> it{ifs};
  std::vector<int> cls(it, std::istream_iterator<int>());

  lz::factory::record_schema schema;
  schema(lz::factory::record_item::POINT10);
  io::writer::file out("testout.laz", schema,
      io::writer::config({0.01,0.01,0.01}, {0.0,0.0,0.0}));

  char buf[sizeof(fmt::las::point10)];
  fmt::las::point10 p10;
  for(const int& cl : cls) {
    p10.classification = static_cast<unsigned char>(cl);
    fmt::packers<fmt::las::point10>::pack(p10, buf);
    out.writePoint(buf);
  }
  out.close();
}

CLS.txt

vuakko commented 8 years ago

Attaching the earlier 700 point file as well, since it produces more clearly bad data. CLS_ORIG.txt

hobu commented 8 years ago

@vuakko thanks for the detailed bug report. We don't use laz-perf for writing LAZ data very much, which is probably why we have thus far not hit this bug. I would say this issue is in our queue, but we probably won't get to it very quickly.

Fixing it will entail going bit-by-bit with LASzip and laz-perf and seeing where they diverge and trying to figure out why.

vuakko commented 8 years ago

Just some additional info. I didn't get too far. The produced file starts differing byte-to-byte from LAStools-produced version only when the stored points start going wrong as well.

I also tried writing points with random sequences of classification values from range [1,N]. For N=3 or 4, corruption occurs very often with 600 points, rarely with 575 points and couldn't reproduce it with 550 points. For N=2 or 5, there's no corruption.

@hobu : Do you think there'd be interest in fixing this by someone knowledgeable?

abellgithub commented 8 years ago

I've replicated this problem. I don't know what's wrong and will work when there's time. And no, I'm not someone knowledgeable. Sorry.

vuakko commented 8 years ago

It's always some little thing :) Big thanks for the fix!