Missing values generated after tensor factorization is greater than the highest possible

neurostatslab / tensortools

A very simple and barebones tensor decomposition library for CP decomposition a.k.a. PARAFAC a.k.a. TCA

MIT License

160 stars 65 forks source link

Missing values generated after tensor factorization is greater than the highest possible #23

Closed ahanagemini closed 4 years ago

ahanagemini commented 4 years ago

We are running our experiments on a tensor where the data is expected to be between 0-5. All the values provided is in that range. But, after we use tensor factorization and then fill up missing values, we see that we get values as large as 10. How to rectify this?

ahwillia commented 4 years ago

Can you confirm you have the latest / up to date code?

offendo commented 4 years ago

Hello, I'm working with ahanagemeni. I can confirm our code is up to date.


> pip show tensortools               
Name: tensortools
Version: 0.3
Summary: Tools for Tensor Decomposition.
Home-page: https://github.com/ahwillia/tensortools
Author: Alex Williams and N. Benjamin Erichson
Author-email: alex.h.willia@gmail.com
License: MIT
Location: /usr/lib/python3.7/site-packages
Requires: tqdm, munkres, scipy, numba, numpy
Required-by: ```

offendo commented 4 years ago

For additional information, we are using ncp_hals on several sparse (around 0.7 % dense) 3-way tensors with continuous values between 1 and 5. As an example, after decomposing and re-multiplying, the previously missing values now range from around 1 to 8.

r = 17
result = tt.optimize.ncp_hals(t, r, m, verbose=False, tol=10 ** -7)

pred = result.factors.full()

print("Original Max: ", t.max())
# only count non-missing values, which are all 0
print("Original Min: ", t[t > 0].min())

Original Max:  5.0
Original Min:  2.33

print("Predicted Max: ", pred.max())
print("Predicted Min: ", pred.min())

Predicted Max:  8.534854286556612
Predicted Min:  1.1902433140533446

ahwillia commented 4 years ago

Make sure that you are past the following commit, which fixed an error that could have been causing this: https://github.com/ahwillia/tensortools/commit/f485fd6418d7ec5a0b866c966154823a80fe067e

It's not clear to me that this is a bug though. If there is a lot of missing data present, it is possible that the model can make predictions larger and smaller than the original data. Are you saying that you're only fitting to 0.7% of the tensor?

offendo commented 4 years ago

We are indeed past that commit.

It's entirely possible (likely) that our data is so sparse that we are getting poor results, but we wanted to confirm it wasn't related to the code, which I don't think it is. Thanks for your fast help! I think we can close the issue, unless @ahanagemini has further concerns.

ahwillia commented 4 years ago

Thanks for trying out the code. Hope it works out for your application!

If you do uncover something else that seems wrong, feel free to reopen this.

ahanagemini commented 4 years ago

Thanks a lot for your quick response. I think that clarifies our question. We do need to get our answers into the 0-5 range for our application, though. Is there any possibility of introducing upper and lower bounds for the values expected in future?

On Sat, 12 Oct 2019 at 03:18, offendo notifications@github.com wrote:

We are indeed past that commit.

It's entirely possible (likely) that our data is so sparse that we are getting poor results, but we wanted to confirm it wasn't related to the code, which I don't think it is. Thanks for your fast help! Closing the issue, unless @ahanagemini https://github.com/ahanagemini has further concerns.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ahwillia/tensortools/issues/23?email_source=notifications&email_token=ADTJUXIPX4EG54XJ6KOQIDDQODYDFA5CNFSM4I7UUXD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBBJRHI#issuecomment-541235357, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTJUXJSTA5FFTSJOHFYQT3QODYDFANCNFSM4I7UUXDQ .