localcc / gvas

GVAS file format parsing library for rust
MIT License
19 stars 2 forks source link

Report cursor position in null-terminated string error reports #16

Closed scottanderson closed 1 year ago

scottanderson commented 1 year ago

Regarding the reporting of null-terminated string errors, the error message does not include the cursor position which is critical for identifying the location of the error within the file. As it currently stands, the error report lacks sufficient information for users to identify the problematic bytes. This is a regression.

localcc commented 1 year ago

I'm not sure how to implement that, because the implementation is generic for all Read traits therefore there is no way to query position. And I don't think rust currently supports choosing an implementation based on more strict requirements if there were to be two, one for Read and one for Read + Seek

localcc commented 1 year ago

I might have an idea to implement another trait which is implemented for both and gives out an optional position, so it can be forwarded to the error, will try that out in the morning.

scottanderson commented 1 year ago

At present, this crate relies on a cursor for reading, while also duplicating all the data within the cursor into a separate data structure. It might be more efficient to consider an alternative approach that does not depend on the Seek trait and eliminates the need for Cursor. By doing so, when reading a file, the data will only be copied into memory once, rather than being copied from the file to a Vec and then duplicated during the parsing process.

localcc commented 1 year ago

Yeah the plan at some point is to make it generic over Read or maybe Read+Seek

scottanderson commented 1 year ago

I suggest returning to the previous CursorExt behavior until we can confirm that the unreal helper can provide feature parity. The Read-specific implementation from unreal helpers does not offer a clear advantage over CursorExt, but it does have some drawbacks. Besides the issue mentioned earlier, unreal helpers also offers worse performance when reading utf-16 strings due to unnecessary copying of data. Furthermore, the unreal helpers interface prevents us from optimizing the string reading function to use slices from the base Cursor, as that optimization depends on referencing the lifetime of the cursor's vec.

Ultimately, the two crates approach this problem with different perspectives of the data, so sharing this type of code might not make sense as long as this crate uses a cursor for parsing.

localcc commented 1 year ago

The plan is to move away from using a cursor in this crate and be generic over Read implementations, I'm not quite sure what unnecessary copying is being done inside unreal_helpers, so if possible, please provide me with information regarding that. The error not reporting the position will be fixed soon in unreal_helpers so we should have feature parity with the previous implementation

scottanderson commented 1 year ago

a94334a5b03dc85a6d851c5b6b73d65d5d092c0d replaced the UTF-16 branch with a new implementation that uses byteorder's LittleEndian::read_u16_into(src: &[u8], dst: &mut [u16]) function, which comes "pre-optimized" so that the compiler does not need to be in release mode to use hardware conversion of consecutive u8s to u16.