pombreda / alembic

Automatically exported from code.google.com/p/alembic
Other
0 stars 0 forks source link

SimpleAbcViewer aborts with ERROR: EXCEPTION: Couldn't open attribute named: .faceVaryingInterpolateBoundary.smp0 #156

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Asset of giant robot causes SimpleAbcViewer to abort

HDF5-DIAG: Error detected in HDF5 (1.8.5) thread 140163287402272:
  #000: H5A.c line 546 in H5Aopen(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #001: H5Oattribute.c line 511 in H5O_attr_open_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #002: H5HF.c line 680 in H5HF_op(): can't operate on object from fractal heap
    major: Heap
    minor: Can't operate on object
  #003: H5HFman.c line 462 in H5HF_man_op(): unable to operate on heap object
    major: Heap
    minor: Can't operate on object
  #004: H5HFman.c line 321 in H5HF_man_op_real(): unable to protect fractal heap direct block
    major: Heap
    minor: Unable to protect metadata
  #005: H5HFdblock.c line 485 in H5HF_man_dblock_protect(): unable to protect fractal heap direct block
    major: Heap
    minor: Unable to protect metadata
  #006: H5AC.c line 1597 in H5AC_protect(): H5C_protect() failed.
    major: Object cache
    minor: Unable to protect metadata
  #007: H5C.c line 3333 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #008: H5C.c line 8177 in H5C_load_entry(): unable to load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #009: H5HFcache.c line 1307 in H5HF_cache_dblock_load(): can't read fractal heap direct block
    major: Heap
    minor: Read failed
  #010: H5Fio.c line 113 in H5F_block_read(): read from metadata accumulator failed
    major: Low-level I/O
    minor: Read failed
  #011: H5Faccum.c line 196 in H5F_accum_read(): driver read request failed
    major: Low-level I/O
    minor: Read failed
  #012: H5FDint.c line 142 in H5FD_read(): driver read request failed
    major: Virtual File Layer
    minor: Read failed
  #013: H5FDsec2.c line 771 in H5FD_sec2_read(): file read failed: time = Wed May 18 13:11:50 2011
, filename = '/var/tmp/doNotRemove/dreadvh.DVH1.sd010.t1.HI.1.r17.abc', file 
descriptor = 10, errno = 14, error message = 'Bad address', buf = 0x1153000, 
size = 168, offset = 15292913
    major: Low-level I/O
    minor: Read failed
terminate called after throwing an instance of 'Alembic::Util::v1::Exception'
  what():  ISubDSchema::get()
ERROR: EXCEPTION:
IScalarProperty::get()
ERROR: EXCEPTION:
Couldn't open attribute named: .faceVaryingInterpolateBoundary.smp0
Abort

Abort happens at random times and intermittently.

Original issue reported on code.google.com by ble...@gmail.com on 18 May 2011 at 8:27

GoogleCodeExporter commented 9 years ago

Original comment by ble...@gmail.com on 18 May 2011 at 8:31

GoogleCodeExporter commented 9 years ago
We're seeing this problem in our modeling package as well.  Additional data 
points:

 - the attribute it can't open varies (it's not always .faceVaryingInterpolateBoundary.smp0)

 - it's triggered when trying to play animation in the viewer (or read additional samples, as in the case of our modeling package)

 - one way to reliably trigger the crash in the SimpleAbcViewer is to give an explicit path to the program, like "./bin/SimpleAbcViewer foo.abc", then start playing the animation.

Original comment by ard...@gmail.com on 19 May 2011 at 10:59

GoogleCodeExporter commented 9 years ago
Did you confirm that the attribute it is failing on exists in the file?
What other attributes has it tripped up on?
Is it always HDF Attributes, or will it miss a dataset as well?

If it does, have you tried using HDF 1.8.6 instead?

Original comment by miller.lucas on 19 May 2011 at 11:27

GoogleCodeExporter commented 9 years ago
Yes, the attribute does exist in the file (confirmed via h5ls -rv, abcecho, 
import with Maya plugin, etc.).  Other attributes: ".inherits.smp0" (from 
AbcGeom::Xform), ".info" (from AbcCoreHDF5), ... I've seen a bunch, with 
different attributes not found even with the exact same file, run at different 
times (this is extremely non-deterministic).

More fun: running the viewer through gdb or valgrind causes the bug to not 
manifest.

Original comment by ard...@gmail.com on 20 May 2011 at 12:44

GoogleCodeExporter commented 9 years ago
The behavior is strangely intermitent. It's also manifested as HDF5 erroring 
for "inherits" and 

  #002: H5Adense.c line 405 in H5A_dense_open(): can't locate attribute in name index

    major: Attribute

    minor: Object not found

zeno_bin: 
/var/tmp/doNotRemove/zeno/RHEL5_AMD64_OPT/include/boost/smart_ptr/shared_ptr.hpp
:409: T* boost::shared_ptr< <template-parameter-1-1> >::operator->() const 
[with T = Alembic::AbcCoreAbstract::v1::BasePropertyReader]: Assertion `px != 
0' failed.

ERROR: Cannot compute bounding box for L2wheelBJNT: 
ISchemaObject::ISchemaObject( wrap )

ERROR: EXCEPTION:

IXformSchema::init()

I've rebuilt with 1.8.7 and SimpleAbcViewer now works, though nothing jumped 
out at me from the release notes. 

Original comment by ble...@gmail.com on 20 May 2011 at 12:51

GoogleCodeExporter commented 9 years ago
There were a BUNCH of bug fixes in HDF 1.8.6, there weren't as many fixes from 
1.8.6 to 1.8.7.  Check the release notes for 1.8.6 and look for valgrind.

Original comment by miller.lucas on 20 May 2011 at 12:54

GoogleCodeExporter commented 9 years ago
And actually I now do see two interesting notes in the release notes that could 
be strong hints
    - Fixed a bug that could occur when getting information for a new-style
      group that was previously opened through a file handle that was later
      closed. (NAF - 2010/09/15)

    - Fixed many memory issues that valgrind exposed.  (QAK - 2010/08/24)

Original comment by ble...@gmail.com on 20 May 2011 at 12:55

GoogleCodeExporter commented 9 years ago

Original comment by miller.lucas on 26 May 2011 at 1:41

GoogleCodeExporter commented 9 years ago

Original comment by ble...@gmail.com on 31 May 2011 at 6:04

GoogleCodeExporter commented 9 years ago

Original comment by ble...@gmail.com on 31 May 2011 at 6:06

GoogleCodeExporter commented 9 years ago
Does 1.8.7 fix these problems or not?

Original comment by miller.lucas on 31 May 2011 at 6:10

GoogleCodeExporter commented 9 years ago
This isn't a complete fix, however I have fixed some race conditions as part of 
this chagneset
http://code.google.com/r/bleair-multithreaded-read/source/detail?r=a19e866b366ba
2eb4fc8c047c88e86473747086a

The previous code was calling sub.made.expired () and if that passed woudl try 
to return that weak pointer. This check then return is a text book race. 
Changed code to lock the weak pointer (in the words of Yoda, do or do not) - 
either we get a real shared pointer to the BasePropertyReader or we get null 
and we regenerate from the property header.

Original comment by ble...@gmail.com on 1 Jun 2011 at 12:52

GoogleCodeExporter commented 9 years ago
Hmm, I wonder if we need to just start using boost::scoped_lock or something 
for all file-access-related stuff.  I also wonder what the "enable-threadsafe" 
option for HDF5 *does*.

To answer your question, we do still get this crash with HDF5 1.8.7.

Original comment by ard...@gmail.com on 1 Jun 2011 at 10:32

GoogleCodeExporter commented 9 years ago
Is there still problems with SimpleAbcViewer?

If not we should probably close this and open up a separate multi-thread issue.

Original comment by miller.lucas on 1 Jun 2011 at 11:21

GoogleCodeExporter commented 9 years ago
Joe, enable-threadsafe configure option to HDF5 results in pthreads and mutexes 
being active in the HDF5 library so that it is multithread safe (not 
concurrent, just safe). For Alembic to support multithreaded reading we need 
this option active in HDF5.

Lucas, I changed the code that locks the weka pointers so that instead of 
comparing to false I do if ( ! sub.made ), is this more to your style 
preferences? I dont' care about the syle differences, but their is a race 
condition if you don't call lock first. You can't use expired () and have 
correct multithreaded code.

I've created a changeset with fixes for multithread read problems. The bugs are 
real, though I can't tell if the SimpleAbcViewer would have encountered them or 
not. See 
http://code.google.com/r/bleair-multithreaded-read/source/detail?r=376b575bceb1b
4a577bed9adc442ff92c87fc386

Original comment by ble...@gmail.com on 4 Jun 2011 at 1:16

GoogleCodeExporter commented 9 years ago
Joe and I have already grabbed the first set of changes involving using lock 
earlier and avoiding expired.  That change wasn't controversial, the style 
change suggestion was to remain consistent with the libraries style. (it isn't 
necessarily my personal preference)

I've already looked at your use of boost::mutex, and need some time to mull it 
over.
My initial instinct was that creating one per property seems excessive. 

Original comment by miller.lucas on 4 Jun 2011 at 1:26

GoogleCodeExporter commented 9 years ago

Original comment by ble...@gmail.com on 11 Jul 2011 at 8:24