rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.97k stars 12.53k forks source link

rust-gdb: "Python Exception <class 'OverflowError'> int too big to convert: " #94245

Open John-Nagle opened 2 years ago

John-Nagle commented 2 years ago

Note: this report may be in the wrong place, but it's not clear where rust-gdb bugs go.

I tried this code:

rust-gdb scenetest 

I expected to see this happen: *Normal GDB operation"

Instead, this happened: Python errors from within GDB, followed by GDB hung and unable to accept more commands

Meta

rustc --version --verbose:

rustc 1.57.0 (f1edd0429 2021-11-29)

rust-gdb --version
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
Backtrace

This is inside rust-gdb, looking at a backtrace, rather than rust-gdb itself having a crash. ``` bt #0 __lll_lock_wait (futex=futex@entry=0x5555585aed70, private=0) at lowlevellock.c:52 #1 0x00007ffff7d820a3 in __GI___pthread_mutex_lock (mutex=0x5555585aed70) at ../nptl/pthread_mutex_lock.c:80 #2 0x0000555555ecf92e in std::sys::unix::mutex::Mutex::lock (self=0x5555585aed70) at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys/unix/mutex.rs:63 #3 0x0000555555ec135d in std::sys_common::mutex::MovableMutex::raw_lock (self=0x55555857cf80) at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys_common/mutex.rs:76 #4 0x0000555555eac505 in std::sync::mutex::Mutex::lock (self=0x55555857cf80) at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sync/mutex.rs:267 #5 0x0000555555e53026 in slscene::render::renderface::RenderFace::get_missing_texture_assets (self=0x7fff38008000, base=0x5555584a1e30) at slscene/src/render/renderface.rs:171 #6 0x0000555555e5424e in slscene::render::renderface::RenderFace::load_needed_textures (base=0x5555584a1e30, scene_object=0x7fff3870ab00, face_index=0) at slscene/src/render/renderface.rs:280 #7 0x0000555555e9d9a0 in slscene::render::renderscene::SceneObject::recalc_lod (self=0x7fff3870ab00, scene=0x7fff60dfd8d0, camera_pos=0x7fff60dfd4c0) at slscene/src/render/renderscene.rs:503 #8 0x00005555557fe927 in scenetest::dummyscene::DummyWorldScene::recalc_all_object_lods (scene=0x7fff60dfd8d0, camera_pos=0x7fff60dfd4c0, objects=Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: .... Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: Python Exception int too big to convert: ^CPython Exception : ^CPython Exception Quit: Python Exception list index out of range: Python Exception list index out of range: Python Exception list index out of range: Python Exception list index out of range: ```

michaelwoerister commented 2 years ago

Thanks for the bug report, @John-Nagle! This looks like a bug somewhere in https://github.com/rust-lang/rust/blob/master/src/etc/gdb_providers.py, so this is the right place to report it.

Is there a small sample program that reproduces the bug?

John-Nagle commented 2 years ago

Unfortunately, I don't have a test case. IThere's no extra-long arithmetic involved in this program; the biggest numeric types are usize and f64. No unsafe code or foreign functions are involved. It's a debug compile, so debug information should be clean.

When it happens, the debugger is broken and no further debugging is possible. Recovery from that exception is not successful. That's the serious part of the problem.

The bug in my own program was an ordinary deadlock (same mutex twice in the same thread). After fixing that, the program worked. So it wasn't a memory corruption problem.

That's all the info I can provide right now.

michaelwoerister commented 2 years ago

Maybe an enum discriminant ends up being 128 bits wide. That can happen if another field is being re-used as the discriminant. Or it's related to fat pointers.

Can you share the type definition of the objects parameter of scenetest::dummyscene::DummyWorldScene::recalc_all_object_lods()? That seems to be where things start going sideways.

John-Nagle commented 2 years ago

OK. Here's the call to recalc_all_object_lods, and the definition of objects

   fn recalc_all_object_lods(
        scene: &RenderScene,
        camera_pos: &Vec3A,
        objects: &[SceneObjectLink],
    ) {

Types used for sharing across threads:

pub type SceneObjectLink = Arc<Mutex<SceneObject>>; // shareable link to scene object
pub type SceneObjectWeakLink = std::sync::Weak<Mutex<SceneObject>>; // weak shareable link to scene object

SceneObject, which owns the data for an entire 3D scene:

#[derive(Debug)]
pub struct SceneObject {
    pub self_weak_link: SceneObjectWeakLink, // weak link to self for others to use
    pub world_object_link: Box<dyn WorldObjectLink>,
    pub viewable_data: ViewableData, // prim, mesh, etc. info.
    pub render_faces: Vec<RenderFace>,          // the faces of the object. If filled in, mesh is good.
    pub loaded_asset: Option<SceneObjectAsset>, // asset used to generate this scene object
    pub desired_asset: Option<SceneObjectAsset>, // asset that needs to be loaded
    pub initial_loc: WorldLocation,             // initial location at which created, for debug use only
}

Traits and structs referenced above:

/// Backlink to world objects, outside this library
/// Will mostly be eliminated soon. It's used by the JSON input system.
pub trait WorldObjectLink: Debug + Send {
    fn build_prim_mesh(
        &self,
        face_id: u8,
        lod: Lod,
        viewable_attrs: ViewableAttrs,
    ) -> Result<MeshCoords, Error>;
    //  Renderer fetches sculpt texture.
    fn build_sculpt_mesh(
        &self,
        face_id: u8,
        lod: Lod,
        viewable_attrs: ViewableAttrs,
        sculpt_image: RgbImage,
    ) -> Result<MeshCoords, Error>;
    fn build_mesh_mesh(
        &self,
        face_id: u8,
        lod: Lod,
        viewable_attrs: ViewableAttrs,
        mesh_info: Vec<MeshCoords>,
    ) -> Result<MeshCoords, Error>;
    //  Get position within region.  Mostly for debug messages.
    fn get_pos_in_region(&self) -> Vec3;
    /// Get transform to region
    fn get_transform_to_region(&self) -> Mat4;
    /// Test use only
    fn test_get_mesh_coords(&self, _face_id: u8) -> &MeshCoords {
        panic!("Unimplemented");
    }
    fn fetch_face_mesh(
        &self,
        scene: &RenderScene,
        face_id: usize,
        lod: Lod,
    ) -> Result<RenderFaceMeshLink, Error> {
        panic!("Unimplemented");
    }

    /// Update visible scene to match this link.
    fn update(
        &self,
        scene: &RenderScene,
        region: &SceneRegion,
        render_faces: &mut Vec<RenderFace>,
        lod: Lod,
        use_json_mesh: bool,
    ) -> Result<(), Error>;
}

ViewableData, which is an enum with multiple big items. These are Plain Old Data. I can provide those defs if requested.

pub enum ViewableData {
    Primitive(PrimitiveData),
    Sculpt(SculptData),
    MeshObject(MeshObjectData),
    Avatar(AvatarData),
    Animesh(AnimeshData),
    Unimplemented,
    Invalid, // if an error was detected or data absent.
}

RenderFace:

pub struct RenderFace {
    pub renderer_link: RendererLink,
    pub self_weak_link: RenderFaceWeakIndex,
    pub object_handle: Option<ObjectHandle>, // objects are not shared.
    pub face_material: Option<RenderMaterial>,
    pub face_mesh: Option<RenderFaceMeshLink>,
    pub mesh_transform: Option<Mat4>, // position of mesh relative to rendering origin
                                      //  ***NEED DATA FOR TEXTURE ANIMATION***
}

SceneObjectAsset, which is pretty simple.

#[derive(Debug, Clone, Copy, Eq, PartialEq)]
pub struct SceneObjectAsset {
    pub uuid_key: UuidKey                       // lod, UUID, but not planar projection mode, because that is per-face
}

pub type UuidKey = (u8, Uuid); // LOD/UUID pair

And finally, WorldLocation, which is, again, Plain Old Data.

#[derive (Copy, Clone, Debug, PartialEq)]
pub struct WorldLocation {
    region_origin: DVec2,                       // origin of region
    pos_within_region: Vec3,                    // position within region, Rend3 order
}

The vector and matrix types, Vec2, Vec3, Mat4, etc. are from the glam crate.

Not much exciting there, except some weak links. Could the debugger have tried to chase through a weak link to a gone object?

tromey commented 2 years ago

Python Exception <class 'OverflowError'> int too big to convert:

To get a more useful Python stack trace, use set python print-stack full. I stick this in my gdbinit, since I bascially always prefer it. (Without this setting gdb just shows a summary.)

Often int problems mean you have a gdb that's using Python 2. I wouldn't expect this on a Linux distro but it is worth double checking.