haasad / PyPardiso

Python interface to the Intel MKL Pardiso library to solve large sparse linear systems of equations
BSD 3-Clause "New" or "Revised" License
135 stars 20 forks source link

double free or corruption when solving the matrix #5

Closed cardosan closed 8 years ago

cardosan commented 8 years ago

I am experiencing *** Error in python': double free or corruption (!prev): 0x000000000305eab0 ***when I run PyPardiso (obviously with bw2). It seems that it happens only (not every but most of the times that) I factorize the matrix, while when it isn't things seems to go smooth (at least for the 7/8 times tested never experienced this error).

When I factorize I also get Warning: pypardiso requires matrix A to be in CSR format for maximum efficiency...but dunno if this is linked with the first issue and is pypardiso or bw2 specific and thus would interest more @cmutel FYI I am on linux 64 bit and anaconda with py 3.5

P.S: from the few test I have done it seems also that factorizing the matrix does not give any benefit (regardless of the number of LCAs I run it takes a couple of sec more, that I guess it is only the time needed for factorizing)

haasad commented 8 years ago

Thank you very much for raising this issue. I suspect that this is connected to #4. I've been trying to solve this for the last few days, but haven't succeded yet. The difficulty of finding the cause is that the error only happens sometimes and i can't consistently reproduce it.

Pypardiso factorizes the matrix by default in the background, that's why you don't see a speed increase when manually factorizing. If it works for you I suggest to not use LCA.lci(factorize=True) for the moment, the performance will be the same. Please report back if the error also appears when you don't use the factorization.

Warning: pypardiso requires matrix A to be in CSR format for maximum efficiency is really only about efficiency, because Pypardiso uses not the same sparse matrix format as scipy and has to convert the matrix from csc to csr format. This could be fixed in bw2calc, I'll create a pull request for @cmutel as soon as I have fixed the memory issue in PyPardiso.

cardosan commented 8 years ago

Thanks Hans, Indeed also to me, as I said, this did not happen always. Yes I will certainly let you know if this happens also without LCA.lci(factorize=True) (so far never). Unfortunately I do not have competencies to help you with the fixing of #4 but will let you know if something else wrong will occur ;) ah, of course if you need some more info that can help to fix the prob just let me know.

cardosan commented 8 years ago

Hi Hans, (maybe) bad news. I got also without factorizing the *** Error in python: double free or corruption (!prev): 0x00000000019eaa60 *** (it is the first time after some 50 LCAs done).

In any case it looks like the LCAs are done and only at the end the prob occurs. Running the attached file the error is: DONE!!! to run 280098 dynamic LCAs take: 7:46:51 *** Error inpython': double free or corruption (!prev): 0x00000000019eaa60 ***` so it seems that everything is finished....but dunno much, just added this info that is maybe of helpful for you :)

my_file.zip

haasad commented 8 years ago

Hi Giuseppe, my name is Adrian btw not Hans :-)

Thank you again for reporting back and the detailed report. In a way this is bad news, but not unexpected. I also encounter those error mostly at the very end. See for example: travis. Do I understand you correctly that all of the code ran, including your output to csv? The error only appeared at the very end, ie. when python tried to shutdown?

cardosan commented 8 years ago

Hey Andrian (maybe 6:30 is too early to work, at least for Italians :D )

All the codes ran and the csv is written, I did not check if the results are correct in terms of content but at least everything is done till the end since they were 15561 FUs x 18 IC (i.e. 280098 LCAs) and the csv is exactly 280098 row.

Yes, exactly, the error appears at the very end (the print with DONE!!! bla bla bla is the very last line of the script in fact)

P.S: might this error affect also the results or I can assume that, as in this case the error appears only at the end, the LCAs results themself are ok?

haasad commented 8 years ago

In this case it is safe to assume that the results are correct and the error is only caused by a problem with garbage collection and freeing the memory when python tries to shutdown.

cardosan commented 8 years ago

Hi Andrain, this time, running the same script, I got the same error BUT at the beginning and not at the very end (always without factorizing). Dunno exactly at what stage but certainly before running the first 10000 LCAs (the script print out when this number of calculations are done).

The previous time I ran it I got also a segmentation fault error...but in this case I do not know if this was due to PyPardiso or Pandas since each LCA results is appended to a pandas df....since I already experienced a few times a segfault working with pandas (in other cases) I have the feeling that is due to pd and not pypa....but cannot confirm this

haasad commented 8 years ago

Sorry for the delayed reply. Unfortunately I think it is likely that this is also caused by PyPardiso. When I try to reproduce this issue, there are some cases (maybe 1 out of 20), where it fails at the begining and not at the end. I suspect that this happens because the memory of the solver is not freed properly and then causes an error when trying to call the solver again. Do you remember if you already ran other scripts (or the same script) before this error happend?

cardosan commented 8 years ago

no prob for the delay, I am just trying to help you reporting these prob ;) anyway, some days passed and I cannot recall precisely, it is probable that I ran the same script before but not sure :S

haasad commented 8 years ago

Ciao Giuseppe, I finally managed to find the reason for the segfaults. I'll add a detailed description tomorrow. conda update -c haasad pypardiso should fix the issues (including the warning when factorizing).

Thanks a lot for your help.

cardosan commented 8 years ago

Amazing Andrian! thanks to you for fixing this!

haasad commented 8 years ago

Fixed in v0.2.0. See the detailed description in #4.