Closed Kriskras99 closed 2 weeks ago
I've added a better backtrace from a release build with debug symbols enabled.
Problem is c2rust_transpile::c_ast::TypedAstContext::sort_top_decls
at c2rust/c2rust-transpile/src/c_ast/mod.rs:674:9
It seems like this is the incorrect Ord
: https://github.com/immunant/c2rust/blob/9eaf8a15ede349a3ed9bff4af4638a80c88b8b65/c2rust-transpile/src/c_ast/mod.rs#L671-L686
Maybe this is also why the declarations were always backwards in rav1d
?
I think it will end up being compare_src_locs
, but I don't understand the code enough to know what's the problem
I think it will end up being
compare_src_locs
, but I don't understand the code enough to know what's the problem
Yeah, it seems like it's in there. @fw-immunant or @rinon, do you remember what was going on here and where a non-total ordering might be coming from? https://github.com/immunant/c2rust/blob/9eaf8a15ede349a3ed9bff4af4638a80c88b8b65/c2rust-transpile/src/c_ast/mod.rs#L209-L235
I'm not familiar with this code but the last semantically relevant change here seems to be 5d3b0aa0e4073ab71b5d6b8ffb1fcf99a1fa88f5.
"Minimal" reproducer. Anybody know of something like creduce for JSON?
The sort_data.json
will pass if you remove more than 5 consecutive spans, and weeding through 1200 spans by hand is a bit of a pain.
use std::{cmp::Ordering, fs::File};
use serde::Deserialize;
#[derive(Copy, Debug, Clone, PartialOrd, PartialEq, Ord, Eq, Deserialize)]
pub struct SrcLoc {
pub fileid: u64,
pub line: u64,
pub column: u64,
}
#[derive(Copy, Debug, Clone, PartialOrd, PartialEq, Ord, Eq, Deserialize)]
pub struct SrcSpan {
pub fileid: u64,
pub begin_line: u64,
pub begin_column: u64,
pub end_line: u64,
pub end_column: u64,
}
impl SrcSpan {
pub fn begin(&self) -> SrcLoc {
SrcLoc {
fileid: self.fileid,
line: self.begin_line,
column: self.begin_column,
}
}
pub fn end(&self) -> SrcLoc {
SrcLoc {
fileid: self.fileid,
line: self.end_line,
column: self.end_column,
}
}
}
pub type FileId = usize;
#[derive(Deserialize)]
struct Temp {
spans: Vec<Option<SrcSpan>>,
file_map: Vec<FileId>,
include_map: Vec<Vec<SrcLoc>>,
}
use Ordering::*;
pub fn main() {
let file = File::open("sort_data.json").unwrap();
let mut temp: Temp = serde_json::from_reader(file).unwrap();
println!("{}", temp.spans.len());
sort_top_decls(&mut temp.spans[..], &temp.file_map, &temp.include_map);
}
pub fn sort_top_decls(spans: &mut [Option<SrcSpan>], file_map: &[FileId], include_map: &[Vec<SrcLoc>]) {
// Group and sort declarations by file and by position
spans.sort_unstable_by(|a, b| {
match (a, b) {
(None, None) => Equal,
(None, _) => Less,
(_, None) => Greater,
(Some(a), Some(b)) => {
compare_src_locs(file_map, include_map, &a.begin(), &b.begin())
},
}
});
}
pub fn compare_src_locs(file_map: &[FileId], include_map: &[Vec<SrcLoc>], a: &SrcLoc, b: &SrcLoc) -> Ordering {
/// Compare `self` with `other`, without regard to file id
fn cmp_pos(a: &SrcLoc, b: &SrcLoc) -> Ordering {
(a.line, a.column).cmp(&(b.line, b.column))
}
let path_a = include_map[file_map[a.fileid as usize]].clone();
let path_b = include_map[file_map[b.fileid as usize]].clone();
for (include_a, include_b) in path_a.iter().zip(path_b.iter()) {
if include_a.fileid != include_b.fileid {
return cmp_pos(include_a, include_b);
}
}
match path_a.len().cmp(&path_b.len()) {
Less => {
// compare the place b was included in a's file with a
let b = path_b.get(path_a.len()).unwrap();
cmp_pos(a, b)
}
Equal => cmp_pos(a, b),
Greater => {
// compare the place a was included in b's file with b
let a = path_a.get(path_b.len()).unwrap();
cmp_pos(a, b)
}
}
}
Thanks for that reproduction! That's very helpful. I don't know of any analogous tool like creduce
but for JSON; that would be useful. That said, I was able to test that on the total order properties, and found this that violates ==
transitivity:
use serde::Deserialize;
use std::cmp::Ordering;
use std::fs::File;
#[derive(Copy, Debug, Clone, PartialOrd, PartialEq, Ord, Eq, Deserialize)]
pub struct SrcLoc {
pub fileid: u64,
pub line: u64,
pub column: u64,
}
#[derive(Copy, Debug, Clone, PartialOrd, PartialEq, Ord, Eq, Deserialize)]
pub struct SrcSpan {
pub fileid: u64,
pub begin_line: u64,
pub begin_column: u64,
pub end_line: u64,
pub end_column: u64,
}
impl SrcSpan {
pub fn begin(&self) -> SrcLoc {
SrcLoc {
fileid: self.fileid,
line: self.begin_line,
column: self.begin_column,
}
}
pub fn end(&self) -> SrcLoc {
SrcLoc {
fileid: self.fileid,
line: self.end_line,
column: self.end_column,
}
}
}
pub type FileId = usize;
#[derive(Deserialize)]
struct Temp {
spans: Vec<Option<SrcSpan>>,
file_map: Vec<FileId>,
include_map: Vec<Vec<SrcLoc>>,
}
pub fn sort_top_decls(
spans: &mut [Option<SrcSpan>],
file_map: &[FileId],
include_map: &[Vec<SrcLoc>],
) {
use Ordering::*;
// Group and sort declarations by file and by position
spans.sort_unstable_by(|a, b| match (a, b) {
(None, None) => Equal,
(None, _) => Less,
(_, None) => Greater,
(Some(a), Some(b)) => compare_src_locs(file_map, include_map, &a.begin(), &b.begin()),
});
}
pub fn compare_src_locs(
file_map: &[FileId],
include_map: &[Vec<SrcLoc>],
a: &SrcLoc,
b: &SrcLoc,
) -> Ordering {
/// Compare `self` with `other`, without regard to file id
fn cmp_pos(a: &SrcLoc, b: &SrcLoc) -> Ordering {
(a.line, a.column).cmp(&(b.line, b.column))
}
let path_a = &include_map[file_map[a.fileid as usize]][..];
let path_b = &include_map[file_map[b.fileid as usize]][..];
for (include_a, include_b) in path_a.iter().zip(path_b.iter()) {
if include_a.fileid != include_b.fileid {
return cmp_pos(include_a, include_b);
}
}
use Ordering::*;
match path_a.len().cmp(&path_b.len()) {
Less => {
// compare the place b was included in a's file with a
let b = &path_b[path_a.len()];
cmp_pos(a, b)
}
Equal => cmp_pos(a, b),
Greater => {
// compare the place a was included in b's file with b
let a = &path_a[path_b.len()];
cmp_pos(a, b)
}
}
}
struct MappedSrcLoc<'a> {
src_loc: SrcLoc,
file_map: &'a [FileId],
include_map: &'a [Vec<SrcLoc>],
}
impl PartialEq for MappedSrcLoc<'_> {
fn eq(&self, other: &Self) -> bool {
self.partial_cmp(other) == Some(Ordering::Equal)
}
}
impl Eq for MappedSrcLoc<'_> {}
impl PartialOrd for MappedSrcLoc<'_> {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(compare_src_locs(
self.file_map,
self.include_map,
&self.src_loc,
&other.src_loc,
))
}
}
impl Ord for MappedSrcLoc<'_> {
fn cmp(&self, other: &Self) -> Ordering {
self.partial_cmp(other).unwrap()
}
}
pub fn main() {
let file = File::open("sort_data.json").unwrap();
let temp: Temp = serde_json::from_reader(file).unwrap();
let Temp {
spans,
file_map,
include_map,
} = temp;
let file_map = &file_map;
let include_map = &include_map;
// sort_top_decls(&mut spans[..], file_map, include_map);
let spans = spans
.into_iter()
.filter_map(|span| span)
.collect::<Vec<_>>();
// spans.sort_unstable_by(|a, b| compare_src_locs(file_map, include_map, &a.begin(), &b.begin()));
let locs = spans
.into_iter()
.map(|span| span.begin())
.collect::<Vec<_>>();
// locs.sort_unstable_by(|a, b| compare_src_locs(file_map, include_map, &a, &b));
let mapped_locs = locs
.into_iter()
.map(|src_loc| MappedSrcLoc {
src_loc,
file_map,
include_map,
})
.collect::<Vec<_>>();
// mapped_locs.sort_unstable();
let n = mapped_locs.len();
for i in 0..n {
let a = &mapped_locs[i];
for j in 0..n {
let b = &mapped_locs[j];
for k in 0..n {
let c = &mapped_locs[k];
if a < b && b < c {
assert!(a < c);
}
if a > b && b > c {
assert!(a > c);
}
if a == b && b == c {
if a != c {
dbg!(a.src_loc);
dbg!(b.src_loc);
dbg!(c.src_loc);
}
assert!(a == c);
}
}
}
if i % 10 == 0 {
println!("{i}");
}
}
}
> cargo run --release
...
[src/main.rs:177:25] a.src_loc = SrcLoc {
fileid: 1,
line: 31,
column: 1,
}
[src/main.rs:178:25] b.src_loc = SrcLoc {
fileid: 2,
line: 214,
column: 1,
}
[src/main.rs:179:25] c.src_loc = SrcLoc {
fileid: 1,
line: 32,
column: 1,
}
...
If you also dbg!
the include path the problem becomes clear:
[src/main.rs:177:25] a.src_loc = SrcLoc {
fileid: 5,
line: 5,
column: 1,
}
[src/main.rs:178:25] &include_map[file_map[a.src_loc.fileid as usize]][..] = [
SrcLoc {
fileid: 5,
line: 6,
column: 10,
},
]
[src/main.rs:179:25] b.src_loc = SrcLoc {
fileid: 2,
line: 4,
column: 1,
}
[src/main.rs:180:25] &include_map[file_map[b.src_loc.fileid as usize]][..] = [
SrcLoc {
fileid: 2,
line: 6,
column: 10,
},
]
[src/main.rs:181:25] c.src_loc = SrcLoc {
fileid: 5,
line: 3,
column: 7,
}
[src/main.rs:182:25] &include_map[file_map[c.src_loc.fileid as usize]][..] = [
SrcLoc {
fileid: 5,
line: 6,
column: 10,
},
]
thread 'main' panicked at src/main.rs:184:21:
assertion failed: a == c
a == b because a and b are from different files but are included at the same position. Same for b == c. But a != c because a and c are from the same file so they are checked not for their include but their actual location.
If you change the start of compare_src_locs
to:
fn cmp_pos(a: &SrcLoc, b: &SrcLoc) -> Ordering {
(a.line, a.column).cmp(&(b.line, b.column))
}
fn cmp_pos_include(a: &SrcLoc, b: &SrcLoc) -> Ordering {
(a.fileid, a.line, a.column).cmp(&(b.fileid, b.line, b.column))
}
let path_a = &include_map[file_map[a.fileid as usize]][..];
let path_b = &include_map[file_map[b.fileid as usize]][..];
for (include_a, include_b) in path_a.iter().zip(path_b.iter()) {
if include_a.fileid != include_b.fileid {
return cmp_pos_include(include_a, include_b);
}
}
Then everything works. But that does go against the comment /// Compare `self` with `other`, without regard to file id
, but I don't know why fileid would be excluded
If this is the correct way to fix this, I'll create a nicer patch and open a pull request
If you also
dbg!
the include path the problem becomes clear: ... a == b because a and b are from different files but are included at the same position. Same for b == c. But a != c because a and c are from the same file so they are checked not for their include but their actual location.
Yeah, I realized that, too. I'm not sure why we're doing that.
If you change the start of
compare_src_locs
to:fn cmp_pos(a: &SrcLoc, b: &SrcLoc) -> Ordering { (a.line, a.column).cmp(&(b.line, b.column)) } fn cmp_pos_include(a: &SrcLoc, b: &SrcLoc) -> Ordering { (a.fileid, a.line, a.column).cmp(&(b.fileid, b.line, b.column)) } let path_a = &include_map[file_map[a.fileid as usize]][..]; let path_b = &include_map[file_map[b.fileid as usize]][..]; for (include_a, include_b) in path_a.iter().zip(path_b.iter()) { if include_a.fileid != include_b.fileid { return cmp_pos_include(include_a, include_b); } }
Note also that cmp_pos_includes
can be just .cmp
as it's the same as the #[derive(Ord)]
.
And I changed the .clone()
s of Vec<SrcLoc>
s for the path_*
variables to just slicing, as we don't need the clones there. It only came up as a performance bottleneck when brute forcing the transitivity checks, but it's better to just not do a clone, so you could include that change as a separate commit as well.
Then everything works. But that does go against the comment
/// Compare `self` with `other`, without regard to file id
, but I don't know why fileid would be excludedIf this is the correct way to fix this, I'll create a nicer patch and open a pull request
I'm not sure either. Let me look into it.
I think it's for sure more correct than the current order, so you're welcome to open a PR for it.
I've created a patch in #1128 where cmp_pos
isn't used at all anymore, and I've added documentation to the function to try to explain what it's doing
Seems one (or more) of the types used in the transpiler have a wrong
Ord
implementation.To reproduce: