Closed onkoe closed 1 month ago
Thank you for the PR! Looks great!
However, the v2 version is about to be released. This version has a lot of new additions and changes. Merging this PR now will cause a lot of conflicts, which may be difficult to resolve.
Could you please take a look at the v2 branch? If possible, I hope this PR can be directly integrated into the v2 version, thank you!
Hi @onkoe
Version 2.0.0 has been released! You can pull directly from the main
branch. Thanks!
Sounds great! I'll work on this in the coming days!
@mindeng Hi again! I've been working on this in my spare time, but have noticed some potential problems with usize = u32
targets.
First, nom
is somewhat flawed on these platforms. It isn't possible to index a slice past 4 GiB, and nom
uses a ToUsize
trait that isn't portable at all! (see: https://github.com/rust-bakery/nom/issues/1029)
In reality, you just use nom::streaming
for this kind of requirement. Which just leaves one question: is the header.box_size
always expected to be under 4 GiB? header.body_size()
certainly isn't in my view, so it might be nice to store it in some custom type with methods to help ensure safety.
I'm still downloading ISO/IEC 14496-12:2022 to get a better understanding, so please forgive any confusion on my part. ;D
@mindeng Hi again! I've been working on this in my spare time, but have noticed some potential problems with
usize = u32
targets.First,
nom
is somewhat flawed on these platforms. It isn't possible to index a slice past 4 GiB, andnom
uses aToUsize
trait that isn't portable at all! (see: rust-bakery/nom#1029)In reality, you just use
nom::streaming
for this kind of requirement. Which just leaves one question: is theheader.box_size
always expected to be under 4 GiB?header.body_size()
certainly isn't in my view, so it might be nice to store it in some custom type with methods to help ensure safety.I'm still downloading ISO/IEC 14496-12:2022 to get a better understanding, so please forgive any confusion on my part. ;D
I think the 4GB cap shouldn't be an issue. When nom-exif traverses boxes to find Exif/metadata, it does not actually read each box, but only parses the header part of the box (usually only 8 bytes). Only when a box that stores Exif/metadata related information is found, the data of the box will actually be read and parsed.
In a video file, the real size is the track data, not the metadata. Therefore, under normal circumstances, nom-exif is unlikely to read data that is too large at one time.
If an abnormal situation occurs (such as file
corruption leading to data abnormality, etc.), nom-exif also sets limits. If it
exceeds 2GB, it will be considered abnormal and will refuse to read the box
(refer to this constant definition: MAX_BODY_LEN
).
In addition, in a 32-bit system, since the maximum upper limit of the memory address space is 4GB, plus the kernel overhead, the upper limit of memory left for applications is generally 2GB at most, not to mention that there will be more strict memory restrictions to applications on Android. Therefore, I think this shouldn't be an issue.
I see! Thanks to your comment and a more in-depth understanding of the standard, I learned the basis for the cap. However, in this case, do you think it would be preferable to use u32
for all these types?
This would prevent all casts, and u32
is always ToUsize
on supported targets, removing the errors. Particularly since the cap is there, I don't think this should affect anything.
Would you be open to a PR using this instead?
I see! Thanks to your comment and a more in-depth understanding of the standard, I learned the basis for the cap. However, in this case, do you think it would be preferable to use
u32
for all these types?This would prevent all casts, and
u32
is alwaysToUsize
on supported targets, removing the errors. Particularly since the cap is there, I don't think this should affect anything.Would you be open to a PR using this instead?
The box_size
still needs to be retained as u64
for the following reasons:
Although we shouldn't be reading more than 4GB of data when parsing metadata, some file metadata might be stored in the middle or at the end of the file. This means that when we look for metadata, we need to skip the preceding boxes, some of which may exceed the u32
size. While we won't read the contents of these boxes, we do need to skip them (using seek or chunked reading, depending on whether the provided reader supports seek). This necessitates that box_size
can store values larger than u32
.
For ISOBMFF, there are two specifications for the length of box_size
: one is u32
, and the other is u64
. We must support both specifications; otherwise, many video files may become unparseable.
Got it! However, nom
imposes some requirements on 32-bit targets with streaming
. For example, the following error is significant:
error[E0277]: the trait bound `u64: ToUsize` is not satisfied
--> src/mov.rs:341:21
|
341 | let (_, body) = streaming::take(header.body_size())(remain)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `ToUsize` is not implemented for `u64`
|
= help: the following other types implement trait `ToUsize`:
u16
u32
u8
usize
Before I use my previous changes, I want to ensure that the layout makes sense. Do you think it's true that...
body_size()
will usually NOT exceed u32::MAX
."body_size()
doesn't relate to the actual file's size."BoxHolder
does not contain video data."u32::MAX
bytes."If any of these don't follow, it may be better to consider other options. Two clear options are to either explicitly disallow 32-bit targets, or create an API that can internally read files and get u64
-indexable buffers on 32-bit systems.
Also, if I may ask, is there any reason to keep the deprecated interfaces on 2.0?
If they're not used for anything, please consider removing them in the next major release. I think they add more "entry points" for the library, which might confuse future contributors.
Got it! However,
nom
imposes some requirements on 32-bit targets withstreaming
. For example, the following error is significant:error[E0277]: the trait bound `u64: ToUsize` is not satisfied --> src/mov.rs:341:21 | 341 | let (_, body) = streaming::take(header.body_size())(remain) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `ToUsize` is not implemented for `u64` | = help: the following other types implement trait `ToUsize`: u16 u32 u8 usize
Before I use my previous changes, I want to ensure that the layout makes sense. Do you think it's true that...
- "A header box's
body_size()
will usually NOT exceedu32::MAX
."- "A header box's
body_size()
doesn't relate to the actual file's size."- "
BoxHolder
does not contain video data."- "More generally, no parser requires a slice containing >
u32::MAX
bytes."If any of these don't follow, it may be better to consider other options. Two clear options are to either explicitly disallow 32-bit targets, or create an API that can internally read files and get
u64
-indexable buffers on 32-bit systems.
Reading data that exceeds the size of uszie
at once should be considered an error, so I submitted this modification, which should solve the compilation problem.
Also, if I may ask, is there any reason to keep the deprecated interfaces on 2.0?
If they're not used for anything, please consider removing them in the next major release. I think they add more "entry points" for the library, which might confuse future contributors.
You are right, we should remove these deprecated methods in the next major version. The reason why it has not been deleted at present is mainly to make the upgrade smoother for users of old versions.
Hey there! I'm using this crate in my Android app, but the build kept failing for 32-bit targets. This PR fixes it and includes some potential style/perf improvements for your troubles.
Changes
There is a slight API change: I exported
nom_exif::values::Rational
in the public API since it was already being returned as part ofEntryValue
.EntryValue
exposes those as part ofEntryValue::as_urational
andEntryValue::as_irational
(and the array methods).I also added
keywords
,categories
, andrust-version
(MSRV) to theCargo.toml
file. These aren't necessary, but can greatly improve the accessibility of the crate.Testing
I checked this PR using the following methods:
cargo +1.80.0 build
cargo +stable build
cargo +beta build
cargo +nightly build
cargo build --target armv7-linux-androideabi
cargo nextest run --all --no-fail-fast
cargo test --doc
cargo doc --open
cargo semver-checks
Everything checked out, including semantic versioning. Please let me know if anything looks questionable - I'm happy to fix it.
Once again, thanks for the excellent crate! :)