mpalmer / lvmsync

Synchronise LVM LVs across a network by sending only snapshotted changes
http://theshed.hezmatt.org/lvmsync
GNU General Public License v3.0
380 stars 60 forks source link

bignum too big to convert into `long' (RangeError) #31

Closed WMP closed 9 years ago

WMP commented 9 years ago
root@gaja ~/lvmsync # cat lvmsynctest.sh 
#!/bin/bash

lvcreate --snapshot -L 100M -n test-snapshot virtual/test

echo 'dd ...'
dd if=/dev/virtual/test-snapshot bs=1M | ssh backup dd of=/dev/vg0/test bs=1M

echo "creating file snapshot_test.$(date +%Y%m%d-%H%M)"
mount /dev/virtual/test /mnt/test
echo 'testujemy snapshoty' >> /mnt/test/snapshot_test.$(date +%Y%m%d-%H%M)
umount /mnt/test
echo 'lvmsync...'

lvmsync -v /dev/virtual/test-snapshot backup:/dev/vg0/test --snapback /tmp/snapback.$(date +%Y%m%d-%H%M)
echo 'lvmsync done'
echo 'cleanup'
lvremove -f /dev/virtual/test-snapshot
root@gaja ~/lvmsync # ./lvmsynctest.sh 
  Logical volume "test-snapshot" created
dd ...
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 5.0786 s, 41.3 MB/s
0+7273 records in
0+7273 records out
209715200 bytes (210 MB) copied, 7.35959 s, 28.5 MB/s
creating file snapshot_test.20140904-2003
lvmsync...
Origin device: virtual/test
Sending chunk 0..4095...
Seeking to 0 in /dev/mapper/virtual-test
Sending chunk 19193837208694788456448..19193837208694788460543...
Seeking to 19193837208694788456448 in /dev/mapper/virtual-test
/var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:214:in `seek': bignum too big to convert into `long' (RangeError)
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:214:in `run_client'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:206:in `each'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:206:in `run_client'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:205:in `open'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:205:in `run_client'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:93:in `main'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:258
        from /usr/local/bin/lvmsync:19:in `load'
        from /usr/local/bin/lvmsync:19
lvmsync done
cleanup
  Logical volume "test-snapshot" successfully removed
root@gaja ~/lvmsync #
mpalmer commented 9 years ago

Blimey that's a big LV you've got there... or else I've got some sort of screwup in the numeric conversion logic. I shall dig.

WMP commented 9 years ago

@mpalmer, no this must be bug in lvmsync :)

root@gaja ~/lvmsync # lvdisplay /dev/virtual/test
  --- Logical volume ---
  LV Name                /dev/virtual/test
  VG Name                virtual
  LV UUID                IPoGx4-8pg3-xHpZ-hciO-mkdH-DYmW-V1tQI7
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                200.00 MiB
  Current LE             50
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:34

root@gaja ~/lvmsync #
mpalmer commented 9 years ago

Actually, this is something extra special, because the location its seeking to is outside even the range of a long long -- in other words, lvmsync shouldn't ever be able to get that number. Could you please run lvmsync /dev/virtual/test-snapshot --stdout >/tmp/somewhere, and send me a link or something to /tmp/somewhere?

WMP commented 9 years ago
root@gaja ~/lvmsync # lvcreate --snapshot -L 100M -n test-snapshot virtual/test
  Logical volume "test-snapshot" created
root@gaja ~/lvmsync # lvmsync /dev/virtual/test-snapshot --stdout >/tmp/somewhere
Transferred 0 bytes in 0.91 seconds
You transferred your changes infx faster than a full dd!
root@gaja ~/lvmsync # less /tmp/somewhere 
root@gaja ~/lvmsync # cat /tmp/somewhere 
lvmsync PROTO[3]
root@gaja ~/lvmsync #
WMP commented 9 years ago

But i thinks you want to make any change on this snapshot, so:

root@gaja ~/lvmsync # mount /dev/virtual/test /mnt/test
root@gaja ~/lvmsync # echo 'testujemy snapshoty' >> /mnt/test/snapshot_test.$(date +%Y%m%d-%H%M)
root@gaja ~/lvmsync # umount /mnt/test
root@gaja ~/lvmsync # lvmsync /dev/virtual/test-snapshot --stdout >/tmp/somewhere
/var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:214:in `seek': bignum too big to convert into `long' (RangeError)
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:214:in `run_client'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:206:in `each'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:206:in `run_client'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:205:in `open'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:205:in `run_client'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:93:in `main'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:258
        from /usr/local/bin/lvmsync:19:in `load'
        from /usr/local/bin/lvmsync:19
root@gaja ~/lvmsync # ls -la /tmp/somewhere 
-rw-rw-r-- 1 root root 4125 Sep  4 23:23 /tmp/somewhere
root@gaja ~/lvmsync #

https://www.mediafire.com/?q266ul21sv5vjsn

mbrtsy commented 9 years ago

I had the same problem (exact the same error message and verbose output). I was using '_' in my lvm names, that was the problem. I replaced vetest_snap with vetest-snap and it worked.

mpalmer commented 9 years ago

I'd be really surprised if the underscore was the true cause of the problem -- that would be surreal. It certainly isn't the problem, since WMP's LVs didn't have underscores in the name. Interesting data point, though, thanks for sharing. It's possible that the bug is sensitive to, say, the location of the LV in the VG or PV.

mbrtsy commented 9 years ago

Sorry, my english isnt the best, i try to explain. When i run lvmsync .... -v, the verbouse output didnt showed the full path of the remoth path. I gave in (e.g.) lvmsync /dev/vg_test/vekvm111_snap /dev/vg_ve/ve_kvm111_testtransfer

"_testtransfer" was cutted, it only showed ve_kvm111 instead of ve_kvm111_testtransfer

I tryed then ve-kvm111-testtransfer and it worked. (same commands, tryed multiple times before).

Maybe i had changed something else, but i dont remember that. I only remember that after changeing the chars everything worked.

(yes, i saw that wmp didnt use underscores. it was only general input because i had the same error message)

uleodolter commented 9 years ago

Hello,

I ran also into this issue, lvmsync is installed on CentOS 6.5 machine.

$ lvmsync /dev/mapper/VolGroup00-idplvmsync obelodalix:/dev/centos_obelodalix/idp --verbose Origin device: VolGroup00/idp Sending chunk 643487825557578579968..643487825557578584063... Seeking to 643487825557578579968 in /dev/mapper/VolGroup00-idp /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:214:in seek': bignum too big to convert intolong' (RangeError) from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:214:in run_client' from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:206:ineach' from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:206:in run_client' from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:205:inopen' from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:205:in run_client' from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:93:inmain' from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:258 from /usr/bin/lvmsync:19:in `load' from /usr/bin/lvmsync:19

UnwashedMeme commented 9 years ago

I tried to run the code that is being done line by line to get some more data of what's going on.

> big_endian?
=> false    # on x86-64
> snap = LVM::Snapshot.new('vmstore', 'ssboot-lvmsync')
=> #<LVM::Snapshot:0x7f7b762fb928 @lv="ssboot-lvmsync", @vg="vmstore">
> snap.differences
=> [157950246131138035712..157950246131138039807, 1152921504606846976..1152921504606851071]
> snap.metadata_header
=> {:chunk_size=>4096}
> diff_block_list = []
=> []
> metafd = File.open(snap.metadata_device, 'r')
=> #<File:/dev/mapper/vmstore-ssboot--lvmsync-cow>
> metafd.seek chunk_size, IO::SEEK_SET
=> 0
> origin_offset, snap_offset = metafd.read(16).unpack("QQ")
=> [35072, 2]
> origin_offset = ntohq(origin_offset)
=> 38562071809359872
> snap_offset   = ntohq(snap_offset)
=> 144115188075855872
> diff_block_list << origin_offset
=> [38562071809359872]
> origin_offset, snap_offset = metafd.read(16).unpack("QQ")
=> [256, 3]
> origin_offset = ntohq(origin_offset)
=> 281474976710656
> snap_offset   = ntohq(snap_offset)
=> 216172782113783808
> diff_block_list << origin_offset
=> [38562071809359872, 281474976710656]
> origin_offset, snap_offset = metafd.read(16).unpack("QQ")
=> [0, 0]
> diff_block_list.map do |b|
     ((b*chunk_size)..(((b+1)*chunk_size)-1))
   end
=> [157950246131138035712..157950246131138039807, 1152921504606846976..1152921504606851071]

This matches what I see when I try to just write to stdout:

$ lvmsync -v /dev/vmstore/ssboot-lvmsync --stdout >/tmp/foo
Origin device: vmstore/ssboot
Sending chunk 157950246131138035712..157950246131138039807...
Seeking to 157950246131138035712 in /dev/mapper/vmstore-ssboot
/var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:214:in `seek': bignum too big to convert into `long' (RangeError)
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:214:in `run_client'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:206:in `each'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:206:in `run_client'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:205:in `open'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:205:in `run_client'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:93:in `main'
        from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:258
        from /usr/local/bin/lvmsync:19:in `load'
        from /usr/local/bin/lvmsync:19

If the endianness isn't swapped then it kind of makes sense:

> 35072*chunk_size
=> 143654912
# That makes sense as a location that might have changed inside a 512MB volume.

# what do the raw bytes look like?
> metafd = File.open(snap.metadata_device, 'r')
=> #<File:/dev/mapper/vmstore-ssboot--lvmsync-cow>
> metafd.seek chunk_size, IO::SEEK_SET
=> 0
> d = metafd.read(16)
=> "\000\211\000\000\000\000\000\000\002\000\000\000\000\000\000\000"
> d.unpack('QQ')
=> [35072, 2]

To me it looks like the LVM metadata is little endian. I wasn't able to find (briefly googling) the actual design doc for lvm snapshots to confirm one way or the other.

lvm2 package 2.02.66-4ubuntu7.4; kernel: 3.8.0-34-generic

Hope this helps some.

ignucius commented 9 years ago

hi,

im having the same issue as well.

oldhost.example.com # lvmsync -v /dev/vg/lv1-snapshot newhost.example.com:/dev/vg/lv1
Origin device: vg/lv1
Sending chunk 59039957329412026793984..59039957329412026798079...
Seeking to 59039957329412026793984 in /dev/mapper/vg-lv1
/var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:214:in `seek': bignum too big to convert into `long' (RangeError)
       from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:214:in `run_client'
       from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:206:in `each'
       from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:206:in `run_client'
       from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:205:in `open'
       from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:205:in `run_client'
       from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:93:in `main'
       from /var/lib/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:258
       from /usr/local/bin/lvmsync:19:in `load'
       from /usr/local/bin/lvmsync:19

oldhost.example.com # ruby -v
ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux]
oldhost.example.com # lvdisplay vg/lv1
--- Logical volume ---
 LV Name                /dev/vg/lv1
 VG Name                vg
 LV UUID                F5oAwg-fcov-Yhj8-XxTV-RGmQ-Zlt1-zQ0pPA
 LV Write Access        read/write
 LV snapshot status     source of
                        /dev/vg/lv1-snapshot [active]
 LV Status              available
 # open                 0
 LV Size                50.00 GiB
 Current LE             12800
 Segments               2
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     256
 Block device           252:3

newhost.example.com # ruby -v
ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]
newhost.example.com # lvdisplay vg/lv1
 --- Logical volume ---
 LV Path                /dev/vg/lv1
 LV Name                lv1
 VG Name                vg
 LV UUID                QJYtd6-m9r9-BDcM-17aK-k1DI-2c12-sc3yuW
 LV Write Access        read/write
 LV Creation host, time newhost, 2014-09-11 10:59:20 +0200
 LV Status              available
 # open                 0
 LV Size                50.00 GiB
 Current LE             12800
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     256
 Block device           252:11
MACscr commented 9 years ago

Grr, I use underscores in all my VG's and LV's. Is there really no fix for this?

mpalmer commented 9 years ago

@MACscr, I still doubt that underscores are the trigger for this bug, and I'm at a loss to explain how they're involved at all (although I'm certainly not ruling it out). There is, surely, a fix for this, but it hasn't been written yet -- I've been too busy to look at it yet, and nobody else has put up a patch.

MACscr commented 9 years ago

I tried renaming things, setting up symlinks as you suggested in another, and but always got the same error of 'bignum too big to convert......' as the OP. So right now I cant get it to work at all. I have zero ruby experience (I just do stuff in php/js/bash), so I cant help much with patching. I do though think that you could probably simplify things by using an individual flag for each setting versus parsing the commands given like you are now. Less changes of parsing errors.

marcofl commented 9 years ago

Same problem here. I also tried with no special chars in the snapshot name, still the same problem. my LVs are quite small, all between 10 and 50GB

[root@samantha ~]# lvmsync -v /dev/vg01/PUPPETsnap /dev/vg01/test
Origin device: vg01/PUPPET
Sending chunk 40229056693432924241920..40229056693432924246015...
Seeking to 40229056693432924241920 in /dev/mapper/vg01-PUPPET
/usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:214:in `seek': bignum too big to convert into `long' (RangeError)
        from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:214:in `run_client'
        from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:206:in `each'
        from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:206:in `run_client'
        from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:205:in `open'
        from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:205:in `run_client'
        from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:93:in `main'
        from /usr/lib/ruby/gems/1.8/gems/lvmsync-3.0.0/bin/lvmsync:258
        from /usr/bin/lvmsync:19:in `load'
        from /usr/bin/lvmsync:19
MACscr commented 9 years ago

Yep, the chunk sizes are super off. The numbers are in the petabyte arena it seems.

MACscr commented 9 years ago

mpalmer,

In what exact scenario does this work for you? What OS, ruby version, etc?

MACscr commented 9 years ago

Well upgrading ruby from 1.8.5 to 1.9.3 didnt seem to make a difference.

[root@hound x86_64]# lvmsync -v /dev/VolGroup00/jazzsnap root@example.com:/dev/LocalStorage/jazz_img
Origin device: VolGroup00/jazz_img
Sending chunk 36600646085248964100096..36600646085248964104191...
Seeking to 36600646085248964100096 in /dev/mapper/VolGroup00-jazz_img
/usr/lib64/ruby/gems/1.9.1/gems/lvmsync-3.0.0/bin/lvmsync:214:in `seek': bignum too big to convert into `long' (RangeError)
    from /usr/lib64/ruby/gems/1.9.1/gems/lvmsync-3.0.0/bin/lvmsync:214:in `block (2 levels) in run_client'
    from /usr/lib64/ruby/gems/1.9.1/gems/lvmsync-3.0.0/bin/lvmsync:206:in `each'
    from /usr/lib64/ruby/gems/1.9.1/gems/lvmsync-3.0.0/bin/lvmsync:206:in `block in run_client'
    from /usr/lib64/ruby/gems/1.9.1/gems/lvmsync-3.0.0/bin/lvmsync:205:in `open'
    from /usr/lib64/ruby/gems/1.9.1/gems/lvmsync-3.0.0/bin/lvmsync:205:in `run_client'
    from /usr/lib64/ruby/gems/1.9.1/gems/lvmsync-3.0.0/bin/lvmsync:93:in `main'
    from /usr/lib64/ruby/gems/1.9.1/gems/lvmsync-3.0.0/bin/lvmsync:258:in `<top (required)>'
    from /usr/bin/lvmsync:23:in `load'
    from /usr/bin/lvmsync:23:in `<main>'
[root@hound x86_64]# ruby -v
ruby 1.9.3p545 (2014-02-24 revision 45159) [x86_64-linux]
MACscr commented 9 years ago

This fix solved the issue for me: https://github.com/sahild/lvmsync/commit/c24413700fd66f9ecd45f766d7dde9605d460a79

UnwashedMeme commented 9 years ago

From what I could find in http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/md/dm-snap-persistent.c#n45 "All on disk structures are in little-endian format. " which seems to directly contradict the assertion in https://github.com/mpalmer/lvmsync/blob/master/lib/lvm/snapshot.rb#L32 that things are in network byte order.

This would then be why sahild@c244137 would seem to make it work; it's reverting the patch that converted stuff to network byte order.

My question for Mr. Palmer: Do you remember where you found anything about it being network byte order? The version of this file from 2009 also says little-endian http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/md/dm-snap-persistent.c?id=4db6bfe02bdc7dc5048f46dd682a94801d029adc

I don't see anything architecture dependent there, but I presume this works for you...

mpalmer commented 9 years ago

@UnwashedMeme, you could very well be onto something there. I don't recall where I got the idea that the values are network byte order, but it's pretty clear I got that spectacularly wrong, and it was only the bug in ntohq that I fixed recently that surfaced the problem. I'll dig into this a bit more now, but I'm pretty confident you've given me a solid lead on this.

mpalmer commented 9 years ago

You da man, @UnwashedMeme! That was totally it. I've found the bug, refactored the code to be harder to screw up in the future, and released version 3.1.0 containing the fix. Thanks to everyone who chimed in on this bug report, and sorry for not having the time to fix this sooner. Thank $DEITY for public holidays...