Closed georgeslabreche closed 3 years ago
I've investigated this a bit further and I believe that the file read process in ArgumentHandler::checkArguments()
to fetch and set the treetype
value isn't representative of how the file is saved:
https://github.com/imbs-hl/ranger/blob/dc9323f57e05d65aa4d51d60b0221d1d770ac4f4/cpp_version/src/utility/ArgumentHandler.cpp#L457-L473
Here's how the file is actually saved.
This can cause treetype
to be set to a wrong value which will at best cause a segmentation fault and at worst silently fail. Also, it's a bit odd that we would set a value in a function that is only meant to check arguments. If anything we should only check if the given treetype argument matches the value saved in file? Then again, why would we want to give the treetype as an argument and not just always fetch it from the saved file?
Thanks! Could you confirm that #570 fixes the issue?
Also, it's a bit odd that we would set a value in a function that is only meant to check arguments.
True, that's bad design and probably the reason for introducing the bug in the first place (didn't expect reading from the file here).
Then again, why would we want to give the treetype as an argument and not just always fetch it from the saved file?
We use the same checkArguments()
etc. for training and prediction. In training we want to set the treetype, in prediction we want to read it from the file.
In any case, I'll keep this one open because we should improve the design here.
Thanks for looking into this @mnwright. It runs successfully on my end so it looks good to me!
I'm using the cpp version. Training with the following command:
And predicting with:
The prediction seems to work but there's a silent failure going on with reading ranger_out.forest when fetching the length value here:
sizeof(size_t)
is 8sizeof(length)
is 8length
is 4990623131753250816sizeof(bool)
is 14 * sizeof(size_t) + length * sizeof(bool)
is 4990623131753250848treetype
is 1The first 10 lines of hexdump on ranger_out.forest gives the following:
Here is a zip containing the csv and ranger_out.forest files: ranger_georges.zip
Despite this the fetched tree type is still 1 so it's a silent failure. However, when I compile for an ARM32 environment it seems like the proper length value is fetched (?) but this leads to an erroneous tree type value being read:
sizeof(size_t)
is 4sizeof(length)
is 4length
is 5sizeof(bool)
is 14 * sizeof(size_t) + length * sizeof(bool)
is 21treetype
is 16843009Here's the first 10 lines of the hexdump of the ranger_out.forest file produced in the ARM32 environment: