openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.56k stars 1.74k forks source link

ZVol with master branch source 2015-09-06 and kernel 4.1.6 comes to grinding halt. #3754

Closed dracwyrm closed 5 years ago

dracwyrm commented 9 years ago

Hi,

In my set up, I have KVM using ZVols on a Gentoo Linux System with Gentoo Sources 4.1.6. I have been using 0.6.4.2 versions of SPL and ZFS. The tank is built with RAIDZ1 on three spinning HDs with external Logs and Cache on two SSDs.

The commit logs show that there have been speed improvements for ZVols, so I thought I would give it a try. I downloaded the source for SPL and ZFS via the download source button that git hub as and renamed it to something like zfs-20150906.zip. Then used the Gentoo ebuilds as a base to install the new versions. I figured, source downloads like that would allow me to chose the date for an update, rather than using a live ebuild. Naturally, I restarted the machine to make sure the new modules were fully loaded and the old ones out of memory.

The VMs would work for a minute and then they would come to a grinding halt. The Windows circle of dots would not spin nor could I really move the mouse that is passed through via USB.

I wanted to do a full reinstall of Windows anyways and all my data was backed up, so I completely destroyed the tank and a did a secure erase to wipe out the drives. I then created a new tank using the updated ZFS binaries and modules hoping this would help. I tried reinstalling Windows, but the installation would not get very far before things ground to a halt again.

Here's the strange bit. I also have regular datasets on the same tank, and those worked faster than ever. I even transferred 750 Gigs of data to one dataset with no slowdowns at all. It's only the ZVols that gave bad performance.

I have since reinstalled 0.6.4.2 versions of SPL and ZFS, but I didn't recreate the tank, and started the virtual machine and it runs as fast as ever. No slow downs at all.

Cheers.

dracwyrm commented 8 years ago

Well, I tried it without passthrough and I had the same degraded performance. I'm well stumped on this. What precisely is going on in this patch that would cause this type of incompatibility with my system? Is it because the max threads was removed, so it's left to the defaults defined in bio.h (I think it was 256)? Is there a kernel setting I don't have?

dracwyrm commented 8 years ago

I messed around with kernel settings and libvirt settings (I switched to directsync) and now my perf top is this while having a very intensive disk writes/reads going on:

    50.69%  [kernel]       [k] _raw_spin_lock_irq                     
     7.87%  [kernel]       [k] read_hpet                              
     3.67%  [kernel]       [k] __isolate_lru_page                     
     2.58%  [kernel]       [k] osq_lock                               
     2.48%  [kernel]       [k] check_preemption_disabled              
     2.32%  [kernel]       [k] putback_inactive_pages                 
     2.24%  [kernel]       [k] __page_check_address                   
     1.95%  [kernel]       [k] shrink_page_list                       
     1.77%  [kernel]       [k] __anon_vma_interval_tree_subtree_search
     1.69%  [kernel]       [k] mm_find_pmd                            
     1.58%  [kernel]       [k] mutex_spin_on_owner.isra.6             
     1.39%  [kernel]       [k] down_read_trylock                      
     1.35%  [kernel]       [k] _raw_spin_lock                         
     1.33%  [kernel]       [k] page_lock_anon_vma_read                
     1.28%  [kernel]       [k] isolate_lru_pages.isra.63              
     1.23%  [kernel]       [k] unlock_page                            
     0.70%  [kernel]       [k] __mod_zone_page_state                  
     0.69%  [kernel]       [k] rmap_walk                              
     0.61%  [kernel]       [k] page_mapping                           
     0.49%  [kernel]       [k] up_read                                
     0.47%  [kernel]       [k] __wake_up_bit                          
     0.47%  [kernel]       [k] anon_vma_interval_tree_iter_first      
     0.45%  [kernel]       [k] page_referenced_one                    
     0.37%  [kernel]       [k] preempt_count_add                      
     0.37%  [kernel]       [k] page_referenced                        
     0.32%  [kernel]       [k] _raw_spin_lock_irqsave                 
     0.32%  [kernel]       [k] page_evictable                         
     0.26%  [kernel]       [k] __this_cpu_preempt_check               
     0.22%  [kernel]       [k] preempt_count_sub                      
     0.19%  [zcommon]      [k] fletcher_4_native                      
     0.18%  [kernel]       [k] mutex_lock                             
     0.17%  [kernel]       [k] kvm_handle_hva_range                   
     0.16%  [vdso]         [.] __vdso_clock_gettime                   
     0.14%  [kernel]       [k] _raw_spin_unlock                       
     0.13%  [kernel]       [k] apic_timer_interrupt 

The raw spin lock seems to be heavy.