pypest / pyemu

python modules for model-independent uncertainty analyses, data-worth analyses, and interfacing with PEST(++)
BSD 3-Clause "New" or "Revised" License
168 stars 94 forks source link

AttributeError: 'Pandas' object has no attribute 'chkpar' #488

Closed jptraylor closed 3 months ago

jptraylor commented 3 months ago

I am trying to do a forward noptmax=0 run of a model. and I got the error message: AttributeError: 'Pandas' object has no attribute 'chkpar' which occurs immediately upon execution of the forward.py script. Here is the screenshot of errors:

starting list mlt 2024-03-19 11:59:12.021732 number of chunks to process: 9 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\multiprocessing\pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\site-packages\pyemu\utils\helpers.py", line 2048, in _process_chunk_list_files raise e File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\site-packages\pyemu\utils\helpers.py", line 2045, in _process_chunk_list_files _process_list_file(model_file, df) File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\site-packages\pyemu\utils\helpers.py", line 2177, in _process_list_file assert len(common_idx) == mlt.chkpar, ( ^^^^^^^^^^ AttributeError: 'Pandas' object has no attribute 'chkpar' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\projects\D_GV6800\analysis\model\cache_cal_testing\cache500eastcr_forward\forward_run.py", line 12, in main() File "D:\projects\D_GV6800\analysis\model\cache_cal_testing\cache500eastcr_forward\forward_run.py", line 8, in main pyemu.helpers.apply_list_and_array_pars(arr_par_file='mult2model_info.csv',chunk_len=50) File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\site-packages\pyemu\utils\helpers.py", line 1599, in apply_list_and_array_pars apply_genericlist_pars(list_pars, chunk_len=chunk_len) File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\site-packages\pyemu\utils\helpers.py", line 2036, in apply_genericlist_pars [xx.get() for xx in x] File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\site-packages\pyemu\utils\helpers.py", line 2036, in [xx.get() for xx in x] ^^^^^^^^ File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\multiprocessing\pool.py", line 774, in get raise self._value AttributeError: 'Pandas' object has no attribute 'chkpar'

My pyemu version is 1.3.3+6.g3322178a. My pandas version is 2.1.1. This model was made on an older version of pyemu and pandas. Is this a versioning issue? I'm wondering if its a new versioning issue between pyemu and pandas?

jtwhite79 commented 3 months ago

Did you rerun the PstFrom pest-interface construction with the new version of pyemu?

jptraylor commented 3 months ago

I have not, but Moussa had a similar issue and he rebuilt everything from scratch w/ the latest versions of packages and is now having an issue running his model.

jptraylor commented 3 months ago

I'm going to try to use pyemu version 1.2.0 and older pandas

jptraylor commented 3 months ago

tried running with pyemu 1.2.0 and pandas 2.0.2 and still same error. this is all with python 3.11. It did run for a colleague that is still using python 3.9, so wondering if its a compatibility issue with newer python?

jtwhite79 commented 3 months ago

So just to be clear, you reran the python functions that build the pest interface with PstFrom (essentially rebuilt the control file, template file, instruction files, etc)? And this was done using the same version of pyemu/pandas that you are using at runtime that calls apply_genericlist_pars()? We have several CI tests that cover this use case with PstFrom and they seem to be ok and I know that chkpar bit was added recently as a runtime check to make sure things are happening as expected. So if you didnt rebuild the interface with the updated pyemu/pandas, I can see where the issues you are having would come from...

jptraylor commented 3 months ago

yes, i rebuilt the entire pest framework (PstFrom) from scratch, including observations. All of the rebuilding worked fine. The problem occurs when we try to do a forward run with pestpp using the forward_run.py

I, and a couple other colleagues get the same error as follows:

Traceback (most recent call last): File "D:\projects\D_GV6800\analysis\model\cache_cal_testing\cache-pest\cache500eastcr\forward_run.py", line 12, in main() File "D:\projects\D_GV6800\analysis\model\cache_cal_testing\cache-pest\cache500eastcr\forward_run.py", line 8, in main pyemu.helpers.apply_list_and_array_pars(arr_par_file='mult2model_info.csv',chunk_len=50) File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\site-packages\pyemu\utils\helpers.py", line 1599, in apply_list_and_array_pars apply_genericlist_pars(list_pars, chunk_len=chunk_len) File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\site-packages\pyemu\utils\helpers.py", line 2036, in apply_genericlist_pars [xx.get() for xx in x] File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\site-packages\pyemu\utils\helpers.py", line 2036, in [xx.get() for xx in x] ^^^^^^^^ File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\multiprocessing\pool.py", line 774, in get raise self._value ValueError: cannot reindex on an axis with duplicate labels

jtwhite79 commented 3 months ago

Ok thats a new error from before and I think its related to the version of pandas (maybe). What version of pandas are you using?

jptraylor commented 3 months ago

im using: pandas 2.1.1 py311hf63dbb6_0 conda-forge

jtwhite79 commented 3 months ago

hmm. That should be ok. Can you zip up your model+pest files and post them somewhere so that I can check them out?

jptraylor commented 3 months ago

can i email them to you?

jtwhite79 commented 3 months ago

Ok I think I see the issue (@briochh would know better) - it looks like you are broadcasting grid-type multiplier parameters for the wel package across multiple stress periods, which isn't a problem in and of itself except that you have differing numbers of well package entries across the stress periods. I think this is what is tripping things up because the number and location of individual well entry (ie grid-type) jumps from 420 in the first stress period to 1042 in the second stress period, which looks like is confusing pandas about how to apply the 420 grid-based wel multipliers. To fix this, either don't broadcast the grid-type pars across multiple stress periods or fill the well entries across the stress periods with zero-flux (dummy) entries so that all stress periods have the same well entries.

@briochh does this jive? For some reason I thought we were trapping for this use case when add_pars() is called...

briochh commented 3 months ago

@jtwhite79, yeah that sounds like something that could cause some dramas. We do have a check in the _par_prep() part of add_parameters() but I think this is only checking that the number of cols is consistent across files that are passed together. Sounds like this issue arises because there is a mismatch between the number of rows right?

If this is what is causing the issue, I feel like we could handle this are forward run time, as the multipliers are aligned according to the values of index_cols so the number of entries or even the order shouldn't matter. I think we should put together a quick test that reproduces this issue and see if this casting is feasible. One challenge might come if the values in index_col are effectively meaningless (i.e. just a counter for the order that they appear in the input file) which might result if we introduce the flexibility raised in the #490. We primarily need to avoid casting pars to the wrong place, erroring is way better than silently doing the wrong thing! Either way, trapping and raising at setup or supporting at run time, or erroring at frun, we should improve this opaque error above!

@jptraylor, in the mean time your best/easiest work around would be, as @jtwhite79 suggests, to pad those input files so that the same well locs are represented in all the files that you are passing to that add_paramters() call.

briochh commented 3 months ago

@jptraylor, are you still seeing that """ assert len(common_idx) == mlt.chkpar, ( ^^^^^^^^^^ AttributeError: 'Pandas' object has no attribute 'chkpar' """ error in the the traceback?

jptraylor commented 3 months ago

@jtwhite79 @briochh I'm not understanding the issue with applying the grid-scale pumping multipliers for each stress period's wel package. Each mult file located in the mult directory, with suffix grid.csv is unique for each stress period and has the same number of rows or mults as there are number of wells in the associated wel file, so those match up. I guess i dont understand where the mismatch is occurring.

jptraylor commented 3 months ago

@jptraylor, are you still seeing that """ assert len(common_idx) == mlt.chkpar, ( ^^^^^^^^^^ AttributeError: 'Pandas' object has no attribute 'chkpar' """ error in the the traceback?

Yes, i got that error when i initially ran the model (which was originally built on older versions of pyemu and pandas) with the recent pyemu and pandas versions. Then, per Jermey's suggestion, i rebuilt the pest framework with the recent pyemu/pandas versions and got that xx.get() error

briochh commented 3 months ago

Ok, that xx.get() error is really just an indication that something is failing on the multiprocess -- the tracebacks from there can be a little long winded. The true error looks to be related to ValueError: cannot reindex on an axis with duplicate labels which likely comes from pandas -- there may be more information further up the traceback. Usually these pandas error relate to duplicate indexes (or trying to create and index that will have duplicates). Are you ok with @jtwhite79 sharing your .zip with me?

jptraylor commented 3 months ago

@jptraylor, are you still seeing that """ assert len(common_idx) == mlt.chkpar, ( ^^^^^^^^^^ AttributeError: 'Pandas' object has no attribute 'chkpar' """ error in the the traceback?

Yes, i got that error when i initially ran the model (which was originally built on older versions of pyemu and pandas) with the recent pyemu and pandas versions. Then, per Jermey's suggestion, i rebuilt the pest framework with the recent pyemu/pandas versions and got that xx.get() error

Ok, that xx.get() error is really just an indication that something is failing on the multiprocess -- the tracebacks from there can be a little long winded. The true error looks to be related to ValueError: cannot reindex on an axis with duplicate labels which likely comes from pandas -- there may be more information further up the traceback. Usually these pandas error relate to duplicate indexes (or trying to create and index that will have duplicates). Are you ok with @jtwhite79 sharing your .zip with me?

Yes, sharing that model with you is fine

jtwhite79 commented 3 months ago

looking at the mult2model_info.csv file, it looks like you are passing a list of wel files to add_pars() bc several mlt_files are being used across multiple wel list input files and those list files have different entiries for wel boundaries. for example "mult\wel_swirr_mult_inst0_grid.csv"

image

jptraylor commented 3 months ago

@jtwhite79 One big difference i found between the original version of the model (created with older version of pyemu/pandas) and this new one (created with newer versions of pyemu/pandas) is that the grid mult files for the wells look different. For example, we have multipliers by datasource, of which there are three pumping datasources (datasource is specified as a boundname in each wel file), so the original version has a mult grid file for wel_datasource that has three entries whereas the new one has a bunch of entries per datasource, although its difficult to tell how the number of entries per datasource in the new file lines up with the number of actual wells. So, the new pyemu/pandas is constructing these grid mults in a different way, it appears.

jtwhite79 commented 3 months ago

Thats an interesting observation. @briochh I dont think we have changed the way the PstFrom sets up parameters for a given set of arguments but i could be wrong. @jptraylor do you have the original model+pest files (if so can you email them to me?)?

jptraylor commented 3 months ago

yes i have them, will send them.

jtwhite79 commented 3 months ago

Ok after looking at those files, the number of well parameters is the same and the mult2model_info.csv is the same. The tpl files for the well multipliers are different - the new tpl file for the "datasource" tag only has two entries compared to the original.

I found this is the call where wel parameters are added and tagged with "datasource"

wel_files = sorted(glob.glob('external/wel_*[!no_ozark].dat'))
pf.add_parameters(filenames=wel_files, par_type="grid",
    par_name_base=f"wel_datasource_mult", pargp="wel_datasource_mult",
    upper_bound=2., lower_bound=0.5,
    #ult_ubound=wel_ultimate_bounds[1], 
    #ult_lbound=wel_ultimate_bounds[0],
    index_cols=[6], use_cols=[3], par_style="multiplier",
    comment_char='#')

but when I run wel_files = sorted(glob.glob('external/wel_*[!no_ozark].dat')), wel_files is empty in both the original and new datasets. Maybe there are some files missing in what you sent me?

jptraylor commented 3 months ago

Well it would help if i sent you the correct model, been testing things on my end, got a few versions going. Basically the one i sent was the the forward run i tried to rerun with the newer pyemu/pandas and it bombed, so it didnt write any of the org/wel.dat files to the external/wel.dat. I'm sending you the correct original now.

jtwhite79 commented 3 months ago

Ok Ive dug in some more. i can confirm that the new style tpl file has many more lines than the one produced by v1.2, however they both still have only 3 unique parameters, corresponding to the 3 datasource types. It looks like the new tpl file has the maximum number of entries found in any wel list file - I seem to remember going to this style broadcasting multiplier file for a reason, but it escapes me now (@briochh will probably remember). Anyhow, for me, if I roll back to v1.2 (py 3.10 and pandas 2.1), I get the same tpl as before, which ought to be what you are after...

mwe.zip

jptraylor commented 3 months ago

I'm using py 3.11 right now, are you suggesting i get a 3.10 version setup with pyeemu v1.2? My python 3.11 w/ pyemu v1.2 doesnt work, it breaks.

jptraylor commented 3 months ago

@jtwhite79 what is the significance of the mwe.zip? Maybe i'm missing something, but thye look to be the same as the original wel files i sent

jtwhite79 commented 3 months ago

I just add those as files for @briochh in case he wanted to check it out. The tpl files look to be the same as the original if I use the version listed above...

briochh commented 3 months ago

@jptraylor, looking into this now. If I am reading it right you are trying to set up a parameter for each "type" or well and broadcast it across all well files. Is this correct? There are a few challenges here but this should be possible. One option would be to make use of the use_rows option in add_parameters(). Using this it should be possible to make multiple calls to add_parameters() using par_type='constant' and passing the index value that you want to the use_rows argument (e.g. `use_rows=[(iwum)]). Unfortunately, there are a few moving parts here with changes within pandas and the subsequent refactoring of the methods in pyemu. If we look at the latest pyemu release version (1.3.3) and the latest pandas 2.1 I think you might need to make sure that if a well exists in 1 file it is also present in another. However, I have a feeling this might make your well files huge? I am exploring how this use-case will play with later version of pyemu and pandas -- it maybe that some specific patches are required... watch this space.

briochh commented 3 months ago

@jptraylor, you could try an approach like this:

import pyemu
import os
import pandas as pd

def main():
    sim_ws = "clean"
    template_ws = 'test'
    pf = pyemu.utils.PstFrom(original_d=sim_ws, new_d=template_ws,
                                 remove_existing=True,
                                 longnames=True,
                                 zero_based=False, tpl_subfolder='tpl')

    wel_files = [f for f in os.listdir(template_ws) if f.startswith("wel_") and f.endswith(".dat")]

    # read in all of the well files to get the index info
    fullwel = {}
    headers = {}
    for f in wel_files:
        kper = f.split('_')[-1].split('.')[0]
        with open(os.path.join(pf.new_d, f), 'r') as fp:
            headers[kper] = fp.readline()
            fullwel[kper] = pd.read_csv(fp, header=None, sep=r'\s+')
    # concat to one big nasty multi index 
    fulweldf = pd.concat(fullwel, names=['kper','idx']).set_index([0,1,2,6], append=True).droplevel('idx')
    # loop over the unique entries for that column 6
    for tag in fulweldf.index.unique(level=6):
        # one add pars across all files for each group
        pf.add_parameters(filenames=wel_files, par_type="constant",
            par_name_base=f"wel-datasource-mult_id:{tag}", pargp="wel-datasource-mult",
            upper_bound=2., lower_bound=0.5,
            #ult_ubound=wel_ultimate_bounds[1], 
            #ult_lbound=wel_ultimate_bounds[0],
            index_cols=[0,1,2,6], use_cols=[3], 
            par_style="multiplier",
            comment_char='#',
            use_rows=fulweldf.index.droplevel('kper').unique().tolist(). # can and inclusive list (they don't have to be in the file)
            )

    pf.add_observations(filename=wel_files[0], index_cols=[0,1,2], use_cols=[3], ofile_sep=r'\s+')

    pst = pf.build_pst()

    pst.write_input_files(pf.new_d)
    bd = os.getcwd()
    os.chdir(pf.new_d)
    try:
        pyemu.helpers.apply_list_and_array_pars(chunk_len=1000)
    except Exception as e:
        os.chdir(bd)
        raise e
    os.chdir(bd)

if __name__ == "__main__":
    main()
jptraylor commented 3 months ago

@jtwhite79 I was able to get everything running with python 3.10, pyemu 2.1 and pandas 1.5.3 (pandas 2.1 threw a deprecation error from the pyemu/helpers.py about a change in kwarg name line_terminator to lineterminator).

briochh commented 3 months ago

@jptraylor, can you check those version number again? pyemu is currently only at 1.3.3. Would be good to note down here the approach that you are using for this type of use case. We added that chkpar business to try and make sure that we would error at run time if the number of model pars connected to a multiplier parameter weren't to as expected (rather than silently continue). With the earlier version of pyemu, it may not throw an error, but it may not be doing what you expect!

jptraylor commented 3 months ago

@briochh ah i see! The original chpar error was with python 3.11.5, pyemu 1.3.3, and pandas 2.1.1

Everything ran with python 3.10, pyemu 1.2, and pandas 1.5.3

briochh commented 3 months ago

ok, definitely things changed between pandas 1.5.3 and 2.1. We pinned pyemu 1.3.3 to pandas < 2.1, so the incompatibility there is not too surprising.

That being said, it might be worth double checking that the multiplier changes that you think are occurring are being propagated through to the resultant model input pars. You could try pyemu 1.3.3 with pandas < 2.1, as we added a bit more robust error check at apply time in 1.3.3. You could set up a quick and dirty test with the example that you passed where you modify parval1, fill template files: pst.write_input_files(pf.new_d), try to apply multiple pars: pyemu.helpers.apply_list_and_array_pars(chunk_len=1000), and then check that input files reflect this change where you expect to see it. That might be a useful test to add to the pyemu test suite.