srendle / libfm

Library for factorization machines
GNU General Public License v3.0
1.49k stars 414 forks source link

Assertion error in Transpose #5

Closed binga closed 9 years ago

binga commented 9 years ago

Hi,

I have prepared a Train.x and Train.y file after which I am trying to transpose the input matrix to obtain Train.xt and during this transpose operation, I am encountering the following error!

Assertion failed: out_cache_col_num > 0, file tools\transpose.cpp, line 125

Any idea what this error is? Could you suggest what can be done?

Thanks, Phani

thierry-silbermann commented 9 years ago

Can you paste the first 3 lines of Train.x and Train.y to see the format

binga commented 9 years ago

The first 4 lines of Train.y:

0100 0000 0400 0000 6d85 2802 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 803f 0000 0000 0000 0000 0000 0000 0000 0000

The Train.x also has a similar binary format!

I just figured out that there's a cache-size parameter in Transpose. Is it something to do with that?

I am trying to transpose a 5.5GB file using an 8GB RAM machine btw.

thierry-silbermann commented 9 years ago

I'm not a specialist with binary file so I'll have to take a look and it might take some time.

But you can check which condition doesn't hold in the while.

((entry_cache_pos + entries_per_col(out_cache_col_position + out_cache_col_num)) < out_entry_cache.dim) && 
((out_cache_col_num+1) < out_row_cache.dim) && 
((out_cache_col_position+out_cache_col_num) < d_in.getNumCols())
binga commented 9 years ago

I believe it has got to do with the assertion of cache. In line, https://github.com/srendle/libfm/blob/master/src/libfm/tools/transpose.cpp#L141 and out_cache_col_num arrives from the line that you have mentioned, yes!

srendle commented 9 years ago

The reason might be a feature in your data that has too many non-zeros to fit into the cache. The code assumes that the memory is large enough to hold all row indices where a feature appears. If you have a very frequent feature and a very large dataset, then the default cache size might be too small.

Can you increase the cache_size? If you have a 8GB machine and if you have compiled for 64 bit, you can use e.g. 6GB for the cache size: -cache_size=6000000000

binga commented 9 years ago

I have increased it to 6GB and the application stopped working! RAM issue probably. However, I started trying out various cache sizes and 700MB was a good fit to produce the Train.xt file! Anyways, never mind. Thank you for pointing out! I'm playing with the tool. Great tool. Great job guys :)

binga commented 9 years ago

Having said that, I am encountering issues while executing the libFM program. This is the log.

$ libFM -task c -train Train -test Validation -dim '1,1,1' -verbosity 1 --cache_size 100000000

libFM Version: 1.40 Author: Steffen Rendle, steffen.rendle@uni-konstanz.de WWW: http://www.libfm.org/

License: Free for academic use. See license.txt.

Loading train... has x = 0 has xt = 1 data transpose... num entries in cache=11945117 num rows in cache=1109766 num_cases=36210029 num_values=722334781 num_features=67108862 min_target=0 max_target=1

Loading test... has x = 0 has xt = 1 data transpose... num entries in cache=8951965 num rows in cache=7096070 num_cases=4218938 num_values=84660394 num_features=67108862 min_target=1 max_target=1

relations: 0

Loading meta data...

attr=67108862 #groups=1

attr_in_group[0]=67108862

Any pointers on how do I go about selecting the cache size based on num_features/num_values/num_cases?

srendle commented 9 years ago

The log does not include any information about the error message. But I guess the problem is that the cache is chosen too small. With binary data, libFM can stream through your data and does not have to hold all data in memory. But it still needs to hold learning statistics in memory: If you run with MCMC (the default), libFM holds two doubles for each training case. I would recommend you to run with as much memory as you can.

srendle commented 9 years ago

One more note: I would recommend not to specify the flag "cache_size" at all when running the libFM executable. libFM chooses automatically the right amount of memory when "cache_size" is not specified.

The flag "cache_size" in the libFM executable is only useful, if your data does not fit into memory. But then the speed is bounded by disk I/O and learning will be slow.