tuxera / reliance-edge

Transactional power-failsafe filesystem for microcontrollers
https://www.tuxera.com/company/open-source
GNU General Public License v2.0
109 stars 32 forks source link

Erroneous use of "current volume" globals #10

Closed danielrlewis closed 5 years ago

danielrlewis commented 5 years ago

Note: Datalight has published a technical bulletin (follow the link to download) which discusses the consequences of this bug, which projects are likely to be affected, and what has been done to correct it.

Reliance Edge is using a global variable for the "current" (i.e., currently accessed) volume in several places where doing so is inappropriate and problematic.

In core/driver/blockio.c, every function is using gpRedVolConf (the global variable for the per-volume configuration parameters for the current volume) to access the number of times that an I/O operation should be retried:

for(bRetryIdx = 0U; bRetryIdx <= gpRedVolConf->bBlockIoRetries; bRetryIdx++)
{
    /* ... */
}

Also in core/driver/blockio.c, RedIoRead() and RedIoWrite() are also using gpRedVolConf for the sector offset:

uint64_t ullSectorStart = ((uint64_t)ulBlockStart << bSectorShift) + gpRedVolConf->ullSectorOffset;

All of the RedIo functions have a bVolNum (volume number) parameter; and so, semantically speaking, they should use the parameters for the specified volume rather than the current volume. The issue is more than semantical with RedIoWrite(). The block buffers in Reliance Edge (core/driver/buffer.c) are shared by all mounted Reliance Edge volumes; and when doing LRU replacement of a dirty block buffer, that module will invoke RedIoWrite() (from BufferWrite()) on a buffer that is not necessarily from the current volume. As a result, the bBlockIoRetries and ullSectorOffset values used may be for the wrong volume. If partitioning is in use, using the wrong ullSectorOffset will generally result in a I/O error, since (as of v2.2.1) the RedOsBDevWrite() implementations use the VOLUME_SECTOR_RANGE_IS_VALID() to validate the sector range, which will detect the discrepancy.

In core/driver/buffer.c, the BufferFinalize() function is using gpRedVolume (the global variable for volume data for the current volume) to get at the current sequence number:

uint64_t ullSeqNum = gpRedVolume->ullSequence;

BufferFinalize() is called when doing LRU replacement of dirty block buffer; as mentioned above, the block being written is not necessarily from the current volume. If the block buffer being written is not for the current volume, that means the incorrect sequence number is being put into the metadata block header. This could result in metadata corruption. The sequence number from the current volume might be higher than the sequence number for the volume whose metadata block is being written. When that metadata block is read back in, Reliance Edge detects (in BufferIsValid()) that the sequence number in the metadata block header is too high for the volume: resulting in a critical error that sets the volume read-only.

BufferFinalize() also calls RedVolSeqNumIncrement(), which always increments the sequence number from the current volume: it should be incrementing the sequence number for the volume whose metadata block is being written.

For further details, refer to the technical bulletin.

danielrlewis commented 5 years ago

See commit 0bc30cf55a (Trunk Build 817) for the correction of this issue. That commit is included in the v2.3 release.