Open prasanna224 opened 5 years ago
Oh, that's disappointing. That error is caused by the "nan" in the output for TC (it's trying to find the best TC value, but "nan" is not comparable). If you put --verbose=2 you can see the TCs as you are running - then you might be able to see a nan arise earlier and stop it. That question is, what causes the "nan"? Here are a few ideas to check for:
Not an issue, but you should add the option --no_row_names, since your first column is not an index.
Another possibility for your dataset is to "bin" the data and treat it as discrete. So for instance, you might set 0: 0, 1:1, 2: (any number greater than 1). Then run without the -c option (c to treat as continuous).
One other suggestion.
This looks like count data. I've always meant to include a specific handling of count data, but haven't yet. One thing that works well for count data is to transform each value to log_2(1+x). The 0's and 1's stay the same, but the long tail of high counts is compressed inward. This makes the numerical modeling easier by reducing outliers.
Thanks for your quick response. We will try the suggestions you have outlined here.
While running a file with the following arguments, I am getting an error after 24 hours of script run time.
Command:
python3 vis_corex.py /home/ppandey/dx_desc.csv --delimiter="|" --layers=32,16,8,1 --dim_hi dden=3 --missing=-1e6 -c -b -v -o dxm --ram=72 --cpu=36
Sample File:
DX101|DX110|DX115|DX118|DX142|DX143|DX155|DX160|DX166|DX169|DX175|DX184|DX196|DX212|DX215|DX218|DX222|DX223|DX234|DX235|DX239|DX253|DX254|DX267|DX271|DX275|DX277|DX278|DX279|DX295|DX298|DX310|DX315|DX332|DX335|DX342|DX343|DX344|DX356|DX385|DX386|DX399|DX404 8|0|1|6|0|0|0|0|0|0|0|0|5|0|3|0|0|6|0|453|0|0|0|2|0|0|6|0|0|0|9|4|6|0|0|1|1|0|9|0|0|41|81 0|4|0|0|0|4|1|0|53|0|0|2|0|0|1|0|0|0|0|0|0|4|0|0|0|0|3|0|0|0|0|0|11|0|4|0|0|0|0|0|7|0|0 0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0 0|0|0|0|1|0|0|0|0|0|0|0|0|0|9|0|0|3|0|0|0|0|0|0|0|0|0|2|0|0|2|0|25|0|0|0|0|0|0|0|2|0|0
Output: `[-0. -0. -0. 0. 0. -0. 0. -0. -0. 0. 0. -0. 0. -0. nan -0.] [ 0. 0. -0. 0. -0. -0. 0. -0. 0. 0. -0. -0. 0. 0. nan -0.] [ 0. 0. 0. 0. 0. -0. -0. 0. 0. -0. -0. 0. -0. 0. nan -0.]
Overall tc: nan
Traceback (most recent call last): File "vis_corex.py", line 777, in
n_cpu=options.cpu, ram=options.ram).fit(X_prev))
File "/home/usr/bio_corex/corex.py", line 171, in fit
self.fit_transform(X)
File "/home/usr/bio_corex/corex.py", line 220, in fit_transform
self.dict = best_dict
UnboundLocalError: local variable 'best_dict' referenced before assignment`