project8 / psyllid

Data acquisition package for the ROACH2 system
Other
0 stars 1 forks source link

error when moving to second acquisition in egg3 reader when processing triggered data #42

Closed cclaessens closed 6 years ago

cclaessens commented 6 years ago

first I thought the first record id and time were missing because katydid stopped reading records when it had to jump from the last record in the first acquisition to the first record in the second acquisition.

The last lines printed by katydid are: 20:07:33 [DEBUG] /KTEgg3Reader.cc(277): Preparing to read next slice; Record shift: 1; Read position (in record): 0

20:07:33 [DEBUG] (tid 140033326847744) rch3/M3Stream.cc(324): Record offset before moving = 50 (fRecordCountInFile = 1966, fFirstRecordInFile = 1916)

20:07:33 [DEBUG] (tid 140033326847744) rch3/M3Stream.cc(337): Record offset after moving = 51 (fRecordCountInFile = 1967, fFirstRecordInFile = 1916)

20:07:33 [DEBUG] (tid 140033326847744) rch3/M3Stream.cc(371): Going to record: record in file: 1967 (record offset in file: 51) -- acquisition: 1 -- record in acquisition: 0

The reason I thought the first record Id was missing is that katydid seemed to wanting to read record 1976. The number printed there however is the record count starting from the first record in the file and the jump to the next record (in the nect acquisition) only happens after these lines are printed. My suspicion seemed confirmed when I opened the egg files with hdfview and the first record time and id were missing (attribute names were present but numbers were missing).

After I couldnt find an error in psyllid or monarch for the writing process I found out that hdfview 2.9 does not show unsigned long int values (it does show unsigned int).

So back to katydid: adding output in monarch ReadRecordAsIs shows that the correct record id and time are in fact found in the metadata. The line failing is line 572 in Monarch3/M3Stream.cc:

fH5CurrentAcqDataSet->read( fStreamRecord.GetData(), fDataTypeUser, *fH5DataSpaceUser, tDataSpaceInFile );

The error reads: HDF5-DIAG: Error detected in HDF5 (1.8.13) thread 140033326847744:

000: ../../../src/H5Dio.c line 161 in H5Dread(): selection+offset not within extent

major: Dataspace minor: Out of range

cclaessens commented 6 years ago

The error is caused in M3Stream.cc in the method ReadRecord.

Here is what ReadRecord would look like:

bool M3Stream::ReadRecord( int anOffset, bool aIfNewAcqStartAtFirstRec ) const
{
    if( ! fIsInitialized ) Initialize();

    std::unique_lock< std::mutex >( *fMutexPtr.get() );

    // anOffset should not move us forward if this is the very first record read in the file (fRecordsAccessed == false)
    // Otherwise anOffset should be incremented to 1 to move us forward appropriately (fRecordsAccessed == true)
    anOffset += (int)fRecordsAccessed;

    //unsigned tRecordOffsetInFile = fRecordCountInFile - fFirstRecordInFile;
    unsigned tRecordOffsetInFile = fRecordCountInFile;
    LDEBUG( mlog, "Record offset before moving = " << tRecordOffsetInFile << " (fRecordCountInFile = " << fRecordCountInFile << ", fFirstRecordInFile = " << fFirstRecordInFile << ")" );

    if( ( anOffset < 0 && (unsigned)abs( anOffset ) > tRecordOffsetInFile ) ||
        ( anOffset > 0 && tRecordOffsetInFile + anOffset >= fNRecordsInFile ) ||
        ( anOffset == 0 && fNRecordsInFile == 0 ))
    {
        // either requested to go back before the beginning of the file, or past the end
        LDEBUG( mlog, "Requested offset would move is out of range for the file" );
        return false;
    }
    fRecordIdInFile = fRecordIdInFile + anOffset;
    fRecordCountInFile = fRecordCountInFile + anOffset;
    //tRecordOffsetInFile = fRecordCountInFile - fFirstRecordInFile;
    tRecordOffsetInFile = fRecordCountInFile;
    LDEBUG( mlog, "Record offset after moving = " << tRecordOffsetInFile << " (fRecordCountInFile = " << fRecordCountInFile << ", fFirstRecordInFile = " << fFirstRecordInFile << ")" );
    unsigned nextAcq = fRecordIndex.at( tRecordOffsetInFile ).first;
    fRecordCountInAcq = fRecordIndex.at( tRecordOffsetInFile ).second;
    LDEBUG( mlog, "next Acq "<<nextAcq<<" RecordCountInAcq "<<fRecordCountInAcq);

    try
    {
        bool tIsNewAcq = false;
        if( nextAcq != fAcquisitionId || ! fRecordsAccessed )
        {
            // we are going to a new acquisition

            // check if we need to correct our position in the new acquisition back to the beginning of the acquisition
            if( aIfNewAcqStartAtFirstRec && fRecordCountInAcq != 0 )
            {
                fRecordCountInFile -= fRecordCountInAcq;
                // make sure the record correction ended up in the same new acquisition
                if( fRecordIndex.at( fRecordCountInFile ).first != nextAcq )
                {
                    throw M3Exception() << "Tried to start at the beginning of the new acquisition, but ended up in a different acquisition: " << fRecordIndex.at( fRecordCountInFile ).first << " != " << nextAcq;
                }
                fRecordCountInAcq = 0;
                tRecordOffsetInFile = fRecordCountInFile; //- fFirstRecordInFile;
                LDEBUG( mlog, "Record offset after moving + correction = " << tRecordOffsetInFile << " (fRecordCountInFile = " << fRecordCountInFile << ", fFirstRecordInFile = " << fFirstRecordInFile << ")" );
            }

            tIsNewAcq = true;
            fAcquisitionId = nextAcq;
            delete fH5CurrentAcqDataSet;
            u32toa( fAcquisitionId, fAcqNameBuffer );
            fH5CurrentAcqDataSet = new H5::DataSet( fH5AcqLoc->openDataSet( fAcqNameBuffer ) );
            H5::Attribute tAttrNRIA( fH5CurrentAcqDataSet->openAttribute( "n_records" ) );
            tAttrNRIA.read( tAttrNRIA.getDataType(), &fNRecordsInAcq );
        }

        LDEBUG( mlog, "Going to record: record in file: " << fRecordCountInFile << " (record offset in file: " << tRecordOffsetInFile << ") -- acquisition: " << nextAcq << " -- record in acquisition: " << fRecordCountInAcq );

        //fDataOffset[ 0 ] = tRecordOffsetInFile;
        fDataOffset[ 0 ] = fRecordCountInAcq;

        (this->*fDoReadRecord)( tIsNewAcq );

        // can now update the first record in the file
        if( ! fRecordsAccessed )
        {
            fRecordsAccessed = true;
            fFirstRecordInFile = fAcqFirstRecId;
            LDEBUG( mlog, "First record in file: " << fFirstRecordInFile );
        }

        // fix fRecordCountInFile; e.g. if a file doesn't start at record 0, we need to fix the record count value after reading the record
        if( tIsNewAcq )
        {
            //fRecordCountInFile = fAcqFirstRecId;
            fRecordIdInFile = fAcqFirstRecId;
            //LDEBUG( mlog, "Updated record in file: " << fRecordCountInFile );
            LDEBUG( mlog, "Updated record id in file: " << fRecordIdInFile );
        }
    }
    catch( H5::Exception& e )
    {
        throw M3Exception() << "HDF5 error while reading a record:\n\t" << e.getCDetailMsg() << " (function: " << e.getFuncName() << ")";
    }

    return true;
}

Can we make these changes or does that interfere with other file/data structures that we are reading with monarch?

nsoblath commented 6 years ago

Tracking for this issue is moved to project8/monarch#22.