crash in disk_io_thread::do_read, trunk

GoogleCodeExporter commented 8 years ago

What do you see instead?
-random crashes in disk_io_thread::do_read

What version of the product are you using? On what operating system?
- trunk 10296, Windows 7 64Bit

I think the optimization disk_io_job::force_copy=0 in 
block_cache::copy_from_piece is the cause for the crashes. It references blocks 
but does not increase the ref_count and so other threads can remove these 
blocks (or whole pieces) with block_cache::try_evict_blocks called from 
disk_io_thread::check_cache_level.
It seems that this makes the piece for the do_read operation go away sometimes 
(data on heap gets freed).
This case seems not to be related to the problem where the piece is not found 
in disk_io_thread::do_read.

At least, this is my conclusion from debugging.
(http://www.massaroddel.de/block_cache_entry%20-%20check_cache_level.png)

MassaRoddel

Original issue reported on code.google.com by webmas...@massaroddel.de on 13 Sep 2014 at 12:37

GoogleCodeExporter commented 8 years ago

block_cache::inc_block_refcount() is called on the block, which is supposed to 
increment refcount on the piece (if it fails for any reason, it returns false 
and copy_from_piece() should fail too).

I don't see how if could return a block reference without pinning the piece and 
the block in the cache.

Original comment by arvid.no...@gmail.com on 14 Sep 2014 at 4:36

GoogleCodeExporter commented 8 years ago

that state in the debugger looks like there is some place where the refcount of 
a block and the sum of all the debug refcounts become out of sync. I will look 
closer where this might happen, and maybe introduce more asserts for that.

Original comment by arvid.no...@gmail.com on 14 Sep 2014 at 4:44

GoogleCodeExporter commented 8 years ago

For debugging I changed cached_piece_entry::in_use and  
cached_piece_entry::in_storage to unsigned int with predefined numbers because 
bool is likely to be true even if the memory is already freed or corrupted. 
Within do_read I changed the value of in_use to something else and that did 
trigger what you see in the screenshot. However, by the time the debugger 
actually stopped at that code the value of in_use was already set back to what 
it should be at that point. There seems to be a race condition between reading 
an removing.

Original comment by webmas...@massaroddel.de on 14 Sep 2014 at 8:58

GoogleCodeExporter commented 8 years ago

Btw, is it normal to have cached_piece_entry::outstanding_read set to 1 and 
cached_piece_entry::refcount set to 0?
I am testing this fix:
bool ok_to_evict(bool ignore_hash = false) const
{
    bool E = refcount == 0
        && piece_refcount == 0
        && num_blocks == 0
        && !hashing
        && read_jobs.size() == 0
        && (ignore_hash || !hash || hash->offset == 0);
    if (E && outstanding_read==1)
    {   E = false;
    }
    return E;
}

Original comment by webmas...@massaroddel.de on 14 Sep 2014 at 2:45

GoogleCodeExporter commented 8 years ago

oh, this might be a case I've missed. when setting outstanding_read, I should 
probably also increment the refcount.

another odd state in your screenshot though is that the sum of reading_count, 
hashing_count and flushing_count should always be <= refcount.

Original comment by arvid.no...@gmail.com on 14 Sep 2014 at 4:18

GoogleCodeExporter commented 8 years ago

I think that reading_count not zero comes from block_cache.cpp 1682.

Original comment by webmas...@massaroddel.de on 14 Sep 2014 at 8:29

GoogleCodeExporter commented 8 years ago

I checked in some fixes and cleanup just now, to trunk. I believe I fixed the 
inconsistent debug refcounter and pinned pieces with outstanding_read to the 
cache. Please let me know if you still see this issue.

Original comment by arvid.no...@gmail.com on 14 Sep 2014 at 10:43

Changed state: Fixed

y500 / libtorrent

crash in disk_io_thread::do_read, trunk #675