lseman / pylspm

Partial Least Squares Path Modeling in Python
MIT License
10 stars 3 forks source link

ValueError #3

Closed borjaapaolaza closed 7 years ago

borjaapaolaza commented 7 years ago

[Python v3.5.1 | Windows]

In line 68 I am getting the following Error: ValueError: operands could not be broadcast together with shapes (100,) (200,)

lseman commented 7 years ago

Which data are you using? Are you using a non-recursive latent variable system?

Laio O. Seman

On Thu, May 4, 2017 at 7:39 AM, a904012 notifications@github.com wrote:

[Python v3.5.1 | Windows]

In line 68 I am getting the following Error: ValueError: operands could not be broadcast together with shapes (100,) (200,)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lseman/pylspm/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/ABoejqPYZeDtXCtaoONSV2r0brKkElMBks5r2arygaJpZM4NQf2S .

borjaapaolaza commented 7 years ago

I am using this:

# Instantiating the PyPLS class
plspm = PyLSpm(data, LVcsv, MVcsv, maxit, stopCriterion)

LVcsv is a string containing the name of a CSV file that has the following structure:

source,target
LV1,LV2
LV1,LV3
...

MVcsv is a string containing the name of a CSV file that has the following structure:

latent,measurement,mode
LV1,Ind1,B
LV1,Ind2,B
LV2,Ind3,B
...

I am quite sure that LVcsv and MVcsv are correct. Most of my doubts are about "data". I don't know whether to introduce a 2D array, a dataframe item, the name of a CSV file with a certain internal structure.

Regarding your question, I am not using a non-recursive model.

Thank you beforehand ;-)

lseman commented 7 years ago

The data must be a pandas dataframe or a CSV where first line contains the names of the indicators. The following lines must contain the indicators values and no missing values.

borjaapaolaza commented 7 years ago

I am getting the following: ValueError: operands could not be broadcast together with shapes (6,) (12,)

Regarding the data, I create a pandas data frame of the following shape. The column names correspond to the different indicators, and the dataframe index are integer numbers from 0 to 5 (6 answers in total). I believe this dataframe is correct for its usage by your library...

   CH1  CH2  CH3  CH4  CH5  CH6  CH7  CH8  CH9  CH10 ...   CN4  CN5  CM1  CM2  \
0    1    5    2    1    4    3    5    2    2     1 ...     5    1    3    5   
1    5    1    3    4    4    3    3    1    3     1 ...     5    2    1    5   
2    1    3    4    2    3    4    1    3    3     5 ...     5    2    5    4   
3    1    2    2    1    5    1    2    2    5     4 ...     5    1    4    1   
4    3    5    2    5    3    3    2    5    4     4 ...     2    4    2    1   
5    3    5    4    2    1    3    1    3    2     1 ...     3    5    2    1   

   CM3  CM4  CM5  PT1  PT2  PT3  
0    2    2    5    3    5    3  
1    3    5    5    5    5    4  
2    3    4    5    1    1    2  
3    3    4    4    2    5    3  
4    2    5    2    1    2    3  
5    5    5    3    5    3    5  

[6 rows x 68 columns]
lseman commented 7 years ago

I committed a fix in path scheme. It should be working now.

Laio O. Seman

On Fri, Jun 2, 2017 at 10:56 AM, a904012 notifications@github.com wrote:

I am getting the following: ValueError: operands could not be broadcast together with shapes (6,) (12,)

Regarding the data, I create a pandas data frame of the following shape. The column names correspond to the different indicators, and the dataframe index are integer numbers from 0 to 5 (6 answers in total). I believe this dataframe is correct for its usage by your library...

CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8 CH9 CH10 ... CN4 CN5 CM1 CM2 \ 0 1 5 2 1 4 3 5 2 2 1 ... 5 1 3 5 1 5 1 3 4 4 3 3 1 3 1 ... 5 2 1 5 2 1 3 4 2 3 4 1 3 3 5 ... 5 2 5 4 3 1 2 2 1 5 1 2 2 5 4 ... 5 1 4 1 4 3 5 2 5 3 3 2 5 4 4 ... 2 4 2 1 5 3 5 4 2 1 3 1 3 2 1 ... 3 5 2 1

CM3 CM4 CM5 PT1 PT2 PT3 0 2 2 5 3 5 3 1 3 5 5 5 5 4 2 3 4 5 1 1 2 3 3 4 4 2 5 3 4 2 5 2 1 2 3 5 5 5 3 5 3 5

[6 rows x 68 columns]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lseman/pylspm/issues/3#issuecomment-305795407, or mute the thread https://github.com/notifications/unsubscribe-auth/ABoejrI4y8helqdHJvBJokDrxoXAA-46ks5sABSigaJpZM4NQf2S .

borjaapaolaza commented 7 years ago

That problem seems now to be fixed, but I get this one:

  File "F:\Krilin\Modules\pylspm\pylspm.py", line 174, in __init__
    numerador = (np.dot(np.dot(weights.T,(S-np.diag(np.diag(S)))),weights))
ValueError: shapes (1,14) and (15,15) not aligned: 14 (dim 1) != 15 (dim 0)
lseman commented 7 years ago

Are all off your variables the same length? I tested with the sample data you provided me and it worked.

On Jun 2, 2017 12:37 PM, "a904012" notifications@github.com wrote:

That problem seems now to be fixed, but I get this one:

File "F:\Krilin\Modules\pylspm\pylspm.py", line 174, in init numerador = (np.dot(np.dot(weights.T,(S-np.diag(np.diag(S)))),weights)) ValueError: shapes (1,14) and (15,15) not aligned: 14 (dim 1) != 15 (dim 0)

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/lseman/pylspm/issues/3#issuecomment-305823999, or mute the thread https://github.com/notifications/unsubscribe-auth/ABoejjVwm_TtBReQrPNVNcRtQBR4Nt2Tks5sACxWgaJpZM4NQf2S .

borjaapaolaza commented 7 years ago

You are right, it works now.

The only problem is a couple of SettingWithCopyWarnings I get. They are both like this:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame
lseman commented 7 years ago

You can ignore the warning by now, they'll not change the result. It's a pandas warning.

In the next major release they'll be gone.

Laio O. Seman

On Fri, Jun 2, 2017 at 1:38 PM, a904012 notifications@github.com wrote:

You are right, it works now.

The only problem is a couple of SettingWithCopyWarnings I get. They are both like this:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lseman/pylspm/issues/3#issuecomment-305842389, or mute the thread https://github.com/notifications/unsubscribe-auth/ABoejjzoXFueBOaZBMtfOdCfNfs_a6Adks5sADqKgaJpZM4NQf2S .

borjaapaolaza commented 7 years ago

All right. It work fine then ;-)

Another issue: how can I know whether the coefficient of the path matrix are significant or not? Can the library calculate the values of the t coefficients?

lseman commented 7 years ago

To calculate the t-values you need to bootstrap and divide the mean of results by the std of the results.

Laio O. Seman

On Fri, Jun 2, 2017 at 3:30 PM, a904012 notifications@github.com wrote:

All right. It work fine then ;-)

Another issue: how can I know whether the coefficient of the path matrix are significant or not? Can the library calculate the values of the t coefficients?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lseman/pylspm/issues/3#issuecomment-305874566, or mute the thread https://github.com/notifications/unsubscribe-auth/ABoejoJsrI7HrF61yZB_lZIYLSwr4vDjks5sAFTbgaJpZM4NQf2S .

borjaapaolaza commented 7 years ago

Dear Mr Seman,

Does the code support missing data? Some programs, such as SmartPLS, have the option to identify a certain character or sequence of characters, as missing data, and do calculations according to that. Example: In the Options menu I set value -99 as missing data, and will process all -99 answers as missing data.

In case the code doesn´t support this, is there an alternative?

Thanks in advance

lseman commented 7 years ago

You can use this piece of code to replace missing values for mean values (NaN):

mean = pd.DataFrame.mean(data_)
for j in range(len(data_.columns)):
    for i in range(len(data_)):
        if (isNaN(data_.ix[i, j])):
            data_.ix[i, j] = mean[j]

But mean imputation is not recommended, I would recommend you to use a library like missForest:

https://github.com/stekhoven/missForest

Laio O. Seman

On Mon, Jul 3, 2017 at 2:14 PM, a904012 notifications@github.com wrote:

Dear Mr Seman,

Does the code support missing data? Some programs, such as SmartPLS, have the option to identify a certain character or sequence of characters, as missing data, and do calculations according to that. Example: In the Options menu I set value -99 as missing data, and will process all -99 answers as missing data.

In case the code doesn´t support this, is there an alternative?

Thanks in advance

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lseman/pylspm/issues/3#issuecomment-312697139, or mute the thread https://github.com/notifications/unsubscribe-auth/ABoejt6zWTvQxhtMhHY-qCOe_n_RiSJ2ks5sKSGJgaJpZM4NQf2S .