sdfordham / pysyncon

A python module for the synthetic control method
MIT License
41 stars 9 forks source link

Getting ValueError: Index contains duplicate entries, cannot reshape #62

Closed abhishekparida2000 closed 1 month ago

abhishekparida2000 commented 3 months ago

Hello. My name is Abhishek. I am trying to use your Python package for running the Synthetic Control Method on a dataset. I was facing some issues while running the synth.fit method. I ran the code:

dataprep = Dataprep( foo=df, predictors=["Cov1"], predictors_op="mean", time_predictors_prior=pd.date_range(start="2022-08-10", end="2022-10-04"), dependent="Mean", unit_variable="District", time_variable="Date", treatment_identifier="NCR", controls_identifier=['Greater Bombay','Kolkata','Chennai','Hyderabad','Bangalore Urban'], time_optimize_ssr=pd.date_range(start="2022-08-10", end="2022-10-04") ) synth = Synth() synth.fit(dataprep=dataprep, optim_initial="ols") synth.weights()

I am getting this error:

ValueError Traceback (most recent call last) Cell In[65], line 1 ----> 1 synth.fit(dataprep=dataprep, optim_initial="ols") 2 synth.weights()

File ~\anaconda3\Lib\site-packages\pysyncon[synth.py:122](http://synth.py:122/), in Synth.fit(self, dataprep, X0, X1, Z0, Z1, custom_V, optim_method, optim_initial, optim_options) 120 self.dataprep = dataprep 121 X0, X1 = dataprep.make_covariate_mats() --> 122 Z0, Z1 = dataprep.make_outcome_mats() 123 else: 124 if X0 is None or X1 is None or Z0 is None or Z1 is None:

File ~\anaconda3\Lib\site-packages\pysyncon[dataprep.py:286](http://dataprep.py:286/), in Dataprep.make_outcome_mats(self, time_period) 267 """Generates the time-series (outcome) matrices to use as input to the fit 268 method of the synthetic control computation. 269 (...) 282 :meta private: 283 """ 284 time_period = time_period if time_period is not None else self.time_optimize_ssr --> 286 Z = self.foo[self.foo[self.time_variable].isin(time_period)].pivot( 287 index=self.time_variable, columns=self.unit_variable, values=self.dependent 288 ) 289 Z0, Z1 = Z[list(self.controls_identifier)], Z[self.treatment_identifier] 290 return Z0, Z1

File ~\anaconda3\Lib\site-packages\pandas\core[frame.py:9025](http://frame.py:9025/), in DataFrame.pivot(self, columns, index, values) 9018 @Substitution ("") 9019 @Appender (_shared_docs["pivot"]) 9020 def pivot( 9021 self, *, columns, index=lib.no_default, values=lib.no_default 9022 ) -> DataFrame: 9023 from pandas.core.reshape.pivot import pivot -> 9025 return pivot(self, index=index, columns=columns, values=values)

File ~\anaconda3\Lib\site-packages\pandas\core\reshape[pivot.py:553](http://pivot.py:553/), in pivot(data, columns, index, values) 549 indexed = data._constructor_sliced(data[values]._values, index=multiindex) 550 # error: Argument 1 to "unstack" of "DataFrame" has incompatible type "Union 551 # [List[Any], ExtensionArray, ndarray[Any, Any], Index, Series]"; expected 552 # "Hashable" --> 553 result = indexed.unstack(columns_listlike) # type: ignore[arg-type] 554 result.index.names = [ 555 name if name is not lib.no_default else None for name in result.index.names 556 ] 558 return result

File ~\anaconda3\Lib\site-packages\pandas\core[series.py:4459](http://series.py:4459/), in Series.unstack(self, level, fill_value, sort) 4414 """ 4415 Unstack, also known as pivot, Series with MultiIndex to produce DataFrame. 4416 (...) 4455 b 2 4 4456 """ 4457 from pandas.core.reshape.reshape import unstack -> 4459 return unstack(self, level, fill_value, sort)

File ~\anaconda3\Lib\site-packages\pandas\core\reshape[reshape.py:517](http://reshape.py:517/), in unstack(obj, level, fill_value, sort) 515 if is_1d_only_ea_dtype(obj.dtype): 516 return _unstack_extension_series(obj, level, fill_value, sort=sort) --> 517 unstacker = _Unstacker( 518 obj.index, level=level, constructor=obj._constructor_expanddim, sort=sort 519 ) 520 return unstacker.get_result( 521 obj._values, value_columns=None, fill_value=fill_value 522 )

File ~\anaconda3\Lib\site-packages\pandas\core\reshape[reshape.py:154](http://reshape.py:154/), in _Unstacker.init(self, index, level, constructor, sort) 146 if num_cells > np.iinfo(np.int32).max: 147 warnings.warn( 148 f"The following operation may generate {num_cells} cells " 149 f"in the resulting pandas object.", 150 PerformanceWarning, 151 stacklevel=find_stack_level(), 152 ) --> 154 self._make_selectors()

File ~\anaconda3\Lib\site-packages\pandas\core\reshape[reshape.py:210](http://reshape.py:210/), in _Unstacker._make_selectors(self) 207 mask.put(selector, True) 209 if mask.sum() < len(self.index): --> 210 raise ValueError("Index contains duplicate entries, cannot reshape") 212 self.group_index = comp_index 213 self.mask = mask

ValueError: Index contains duplicate entries, cannot reshape Read more I have gone through the documentation but am unable to figure out the source of this error.

Could you please help with this?

sdfordham commented 3 months ago

For each unit, do you have exactly one row for each time-period? That error suggests you have multiple, hence why the pivot is failing. But I cannot say for sure without seeing the dataframe.

abhishekparida2000 commented 3 months ago

Yes got it. Thanks. There were some duplicates in the data-frame.

abhishekparida2000 commented 3 months ago

Also, I was wondering if there is any option to handle multiple treatment units and multiple stages of treatment in the package. I am trying to study the impact of a policy (GRAP) for curbing air pollution (which is implemented in a stagewise manner, that is, depending on the AQI level the stages are upgraded or downgraded) on economic activity, and my treatment group has observations over time for multiple units. Till now, I have used the synthetic control method with the aggregated data of the treatment group for each unit of time to study the overall impact of the policy, that is ignoring the stagewise manner in which the policy is implemented.

sdfordham commented 3 months ago

There are no built-in methods to deal with multiple treated units right now, but I do intend to make a wrapper that can do this with concurrency at some point. If you still do want to use this package for that purpose, then you would need to wrap your code in a for loop and adjust the treated and control units in each run, as well as any other data needed. If you want to use concurrency then you can look at the Placebo Test code here for hints on what the code could look like.

For multi-stage treatments, I have no plans to implement that right now as I don't see how to do it robustly. I do recall reading a research article on this topic recently but I cannot remember the details or the name, I will have a look for it. As a first step, the approach in your last sentence sounds like the way I might go about it if I were in your position.

sdfordham commented 3 months ago

I should add that if you are willing to use R, then the Augsynth package does have methods for dealing with multiple treated units. It also appears to have methods for dealing with "staggered adoptions" which may be helpful to you, see here.

abhishekparida2000 commented 3 months ago

Thanks for the suggestions. I'll give the Augsynth package a try. While using the pysyncon package, I noticed that all the units in the control group were assigned equal weights. Is this normal, or could there be an issue? Also, on running the Placebo Test I am getting the error:

_RemoteTraceback Traceback (most recent call last) _RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\donal\anaconda3\Lib\concurrent\futures\process.py", line 256, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\donal\anaconda3\Lib\site-packages\pysyncon\utils.py", line 249, in single_placebo min = int(min(dataprep.foo[dataprep.time_variable])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: int() argument must be a string, a bytes-like object or a real number, not 'Timestamp' """

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last) Cell In[306], line 2 1 placebo_test = PlaceboTest() ----> 2 placebo_test.fit( 3 dataprep=dataprep, 4 scm=synth, 5 scm_options={"optim_method": "Nelder-Mead", "optim_initial": "equal"}, 6 )

File ~\anaconda3\Lib\site-packages\pysyncon\utils.py:185, in PlaceboTest.fit(self, dataprep, scm, scm_options, max_workers, verbose) 176 to_do.append( 177 executor.submit( 178 self._single_placebo, (...) 182 ) 183 ) 184 for idx, future in enumerate(futures.as_completed(to_do), 1): --> 185 path, gap = future.result() 186 if verbose: 187 print(f"({idx}/{n_tests}) Completed placebo test for {path.name}.")

File ~\anaconda3\Lib\concurrent\futures_base.py:449, in Future.result(self, timeout) 447 raise CancelledError() 448 elif self._state == FINISHED: --> 449 return self.__get_result() 451 self._condition.wait(timeout) 453 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~\anaconda3\Lib\concurrent\futures_base.py:401, in Future.__get_result(self) 399 if self._exception: 400 try: --> 401 raise self._exception 402 finally: 403 # Break a reference cycle with the exception in self._exception 404 self = None

TypeError: int() argument must be a string, a bytes-like object or a real number, not 'Timestamp'

Any idea where the error might be coming from? Could it be related to the type of the Date column in my dataset?

sdfordham commented 3 months ago

While using the pysyncon package, I noticed that all the units in the control group were assigned equal weights. Is this normal, or could there be an issue?

There are many reasons that can cause this chief among them are unbalanced panel data and/or too few time-periods used in the optimize. Otherwise, you can try vary the starting point or the algorithm used to try and force it away from that solution.

Also, on running the Placebo Test I am getting the error: Any idea where the error might be coming from? Could it be related to the type of the Date column in my dataset?

It looks like that is the problem here, I have no idea why it is coercing to an integer there, this looks like a bug, I will check it out.

abhishekparida2000 commented 3 months ago

I think the equal weights issue arose because I included the Treatment Dummy variable in the list of predictors. After removing this variable, I now get unequal weights.

The data is recorded at a daily frequency, and the Date column is in datetime64[ns] format. There might be a problem converting this format to integers, an issue that doesn't occur when the data is in yearly format.

Do you know of any functional packages for performing Synthetic Control DID?

sdfordham commented 3 months ago

I have two PR to deal with both these issues.

I don't know a Python package for that, again if you are willing to use R then Arkhangelsky et al have a package based on their paper https://github.com/synth-inference/synthdid

abhishekparida2000 commented 3 months ago

The Placebo test is still not working. I am getting this error:

BrokenProcessPool Traceback (most recent call last) Cell In[70], line 2 1 placebo_test = PlaceboTest() ----> 2 placebo_test.fit( 3 dataprep=dataprep, 4 scm=synth, 5 scm_options={"optim_method": "Nelder-Mead", "optim_initial": "ols"}, 6 )

File ~\anaconda3\Lib\site-packages\pysyncon\utils.py:185, in PlaceboTest.fit(self, dataprep, scm, scm_options, max_workers, verbose) 176 to_do.append( 177 executor.submit( 178 self._single_placebo, (...) 182 ) 183 ) 184 for idx, future in enumerate(futures.as_completed(to_do), 1): --> 185 path, gap = future.result() 186 if verbose: 187 print(f"({idx}/{n_tests}) Completed placebo test for {path.name}.")

File ~\anaconda3\Lib\concurrent\futures_base.py:449, in Future.result(self, timeout) 447 raise CancelledError() 448 elif self._state == FINISHED: --> 449 return self.__get_result() 451 self._condition.wait(timeout) 453 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~\anaconda3\Lib\concurrent\futures_base.py:401, in Future.__get_result(self) 399 if self._exception: 400 try: --> 401 raise self._exception 402 finally: 403 # Break a reference cycle with the exception in self._exception 404 self = None

BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

sdfordham commented 3 months ago

Can you post the full error message? I cannot see a reference to the line that caused the actual error.

abhishekparida2000 commented 3 months ago

This is the code I ran: dataprep = Dataprep( foo=df1, predictors=['T2M', 'QV2M','PRECTOTCORR'], predictors_op="mean", time_predictors_prior=pd.date_range(start="2022-08-10", end="2022-10-04"), dependent="Mean", unit_variable="District", time_variable="Date", treatment_identifier="NCR", controls_identifier=['Greater Bombay','Kolkata','Chennai','Hyderabad','Bangalore Urban', 'Kolar', 'Tumkur', 'Thane', 'Kancheepuram', 'Thiruvallur', 'Vellore' , 'Haora'], ) synth = Synth() placebo_test = PlaceboTest() placebo_test.fit( dataprep=dataprep, scm=synth, scm_options={"optim_method": "Nelder-Mead", "optim_initial": "ols"}, )


BrokenProcessPool Traceback (most recent call last) Cell In[70], line 2 1 placebo_test = PlaceboTest() ----> 2 placebo_test.fit( 3 dataprep=dataprep, 4 scm=synth, 5 scm_options={"optim_method": "Nelder-Mead", "optim_initial": "ols"}, 6 )

File ~\anaconda3\Lib\site-packages\pysyncon\utils.py:185, in PlaceboTest.fit(self, dataprep, scm, scm_options, max_workers, verbose) 176 to_do.append( 177 executor.submit( 178 self._single_placebo, (...) 182 ) 183 ) 184 for idx, future in enumerate(futures.as_completed(to_do), 1): --> 185 path, gap = future.result() 186 if verbose: 187 print(f"({idx}/{n_tests}) Completed placebo test for {path.name}.")

File ~\anaconda3\Lib\concurrent\futures_base.py:449, in Future.result(self, timeout) 447 raise CancelledError() 448 elif self._state == FINISHED: --> 449 return self.__get_result() 451 self._condition.wait(timeout) 453 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~\anaconda3\Lib\concurrent\futures_base.py:401, in Future.__get_result(self) 399 if self._exception: 400 try: --> 401 raise self._exception 402 finally: 403 # Break a reference cycle with the exception in self._exception 404 self = None

BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

sdfordham commented 3 months ago

I don't think that is the full error message, it looks truncated (see the (...) bit in the middle). Instead of the Placebo Test, if you run the following, does it work?

synth.fit(dataprep=dataprep, optim_method="Nelder-Mead", optim_initial="ols")

If so, then I would guess there is a problem with your data, you should check if the dtypes of foo columns are compatible with the types of the arguments that you gave to Dataprep.

abhishekparida2000 commented 3 months ago

synth.fit is working fine. I ran the Placebo Test after running this line. I missed this line while copying the code.

The datatypes of the columns are as:

District-object
Date-datetime64[ns] Mean-float64
TD-int32
T2M-float64
QV2M-float64
PRECTOTCORR-float64

The datatypes of foo columns seem okay to me.

Here is the code:

dataprep = Dataprep( foo=df1, predictors=['T2M', 'QV2M','PRECTOTCORR'], predictors_op="mean", time_predictors_prior=pd.date_range(start="2022-08-10", end="2022-10-04"), dependent="Mean", unit_variable="District", time_variable="Date", treatment_identifier="NCR", controls_identifier=['Greater Bombay','Kolkata','Chennai','Hyderabad','Bangalore Urban', 'Kolar', 'Tumkur', 'Thane', 'Kancheepuram', 'Thiruvallur', 'Vellore' , 'Haora'], ) synth = Synth() synth.fit(dataprep=dataprep, optim_method="Nelder-Mead", optim_initial="ols") placebo_test = PlaceboTest() placebo_test.fit( dataprep=dataprep, scm=synth, scm_options={"optim_method": "Nelder-Mead", "optim_initial": "ols"}, )

sdfordham commented 3 months ago

Yes the dtypes look fine. It is strange that your synth.fit works but the Placebo Test gives that vague sounding error, normally if there is a failure in the pool it will raise the error that caused the failure. How are you running the code, in a jupyter notebook or are you running a python file in the terminal?

abhishekparida2000 commented 3 months ago

I am running it in Jupyter Notebook.

abhishekparida2000 commented 3 months ago

I tried to run it in a Python script. I am getting this error.

           ^^^^^^^^^^^^^^^^^Traceback (most recent call last):

File "", line 1, in File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 122, in spawn_main exitcode = _main(fd, parent_sentinel) ^^^^^^^^^^^^^^^ Traceback (most recent call last): self._spawn_process() File "", line 1, in ^^ File "D:\Coding\Python Scripts.conda\Lib\concurrent\futures\process.py", line 785, in _spawn_process ^^^^^^^^^^^^^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 122, in spawn_main p.start() File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\process.py", line 121, in start Traceback (most recent call last): ^ self._popen = self._Popen(self) File "", line 1, in exitcode = _main(fd, parent_sentinel)^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 122, in spawn_main

^ ^ ^ ^^ ^^ ^^exitcode = _main(fd, parent_sentinel) ^ ^^ ^^^ ^ ^^ ^ ^^ ^ ^^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 131, in _main ^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data _check_not_importing_main() File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main ^^^^ ^^^prepare(preparation_data) ^^raise RuntimeError(''' File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 246, in prepare ^^^^^^^^^^^ ^^ RuntimeError: ^^^^ An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

    To fix this issue, refer to the "Safe importing of main module"
    section in https://docs.python.org/3/library/multiprocessing.html

^^^^^^^^^^^^^^^^^^^^^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 131, in _main prepare(preparation_data) File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 246, in prepare _fixup_main_from_path(data['init_main_from_path']) File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path main_content = runpy.run_path(main_path, _fixup_main_from_path(data['init_main_from_path'])^ ^^^^^^^^^^^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path ^^^^^^^^^^^^^^^ File "", line 291, in run_path File "", line 98, in _run_module_code File "", line 88, in _run_code File "d:\Coding\Placebo.py", line 28, in ^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\context.py", line 336, in _Popen ^^^^^^^^^^^ placebo_test.fit( File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 131, in _main main_content = runpy.run_path(main_path, return Popen(process_obj) File "D:\Coding\Python Scripts.conda\Lib\site-packages\pysyncon\utils.py", line 177, in fit prepare(preparation_data) executor.submit( File "D:\Coding\Python Scripts.conda\Lib\concurrent\futures\process.py", line 808, in submit ^^^^^^^^^^^^^^^^^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\popen_spawn_win32.py", line 46, in init prep_data = spawn.get_preparation_data(process_obj._name) File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 246, in prepare self._adjust_process_count() File "D:\Coding\Python Scripts.conda\Lib\concurrent\futures\process.py", line 767, in _adjust_process_count _fixup_main_from_path(data['init_main_from_path']) File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path self._spawn_process() ^^^ File "D:\Coding\Python Scripts.conda\Lib\concurrent\futures\process.py", line 785, in _spawn_process main_content = runpy.run_path(main_path, ^ ^^p.start() ^^^^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\process.py", line 121, in start ^^^^^^^^^^ ^^^self._popen = self._Popen(self)^ ^^^ ^^^ ^^^ ^ ^^^ ^^^ ^^^ ^^^ ^^^^^^^^^^^^^^^^^^^^^ ^^^ ^^^^^ File "", line 291, in run_path ^^ File "", line 98, in _run_module_code File "", line 88, in _run_code ^^^^^^^^^^^^^^ File "d:\Coding\Placebo.py", line 28, in ^^^^^^^^^^^ placebo_test.fit( File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\context.py", line 336, in _Popen File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data return Popen(process_obj)

^ File "D:\Coding\Python Scripts.conda\Lib\site-packages\pysyncon\utils.py", line 177, in fit ^^^^^^^^ _check_not_importing_main() ^^^executor.submit(

^^ File "D:\Coding\Python Scripts.conda\Lib\concurrent\futures\process.py", line 808, in submit File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main raise RuntimeError(''' self._adjust_process_count() File "D:\Coding\Python Scripts.conda\Lib\concurrent\futures\process.py", line 767, in _adjust_process_count ^^^^^ File "", line 291, in run_path File "", line 98, in _run_module_code ^^RuntimeError File "", line 88, in _run_code self._spawn_process() File "d:\Coding\Placebo.py", line 28, in ^^^^^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\popen_spawn_win32.py", line 46, in init File "D:\Coding\Python Scripts.conda\Lib\concurrent\futures\process.py", line 785, in _spawn_process p.start() placebo_test.fit( File "D:\Coding\Python Scripts.conda\Lib\site-packages\pysyncon\utils.py", line 177, in fit prep_data = spawn.get_preparation_data(process_obj._name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ executor.submit(^: File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\process.py", line 121, in start

File "D:\Coding\Python Scripts.conda\Lib\concurrent\futures\process.py", line 808, in submit self._popen = self._Popen(self) ^ An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

    To fix this issue, refer to the "Safe importing of main module"
    section in https://docs.python.org/3/library/multiprocessing.html
    ^^^
self._adjust_process_count()

File "D:\Coding\Python Scripts.conda\Lib\concurrent\futures\process.py", line 767, in _adjust_process_count ^^^ ^^self._spawn_process() ^^^^^^^ File "D:\Coding\Python Scripts.conda\Lib\concurrent\futures\process.py", line 785, in _spawn_process p.start() File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data ^^^^^^^^^^^^^^ _check_not_importing_main() File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main ^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\context.py", line 336, in _Popen raise RuntimeError(''' RuntimeError: File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\process.py", line 121, in start

    An attempt has been made to start a new process before the
    current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

    To fix this issue, refer to the "Safe importing of main module"
    section in https://docs.python.org/3/library/multiprocessing.html
    return Popen(process_obj)

       ^^^^^^^^^^^^^^^^^^

File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\popen_spawn_win32.py", line 46, in init prep_data = spawn.get_preparation_data(process_obj._name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data _check_not_importing_main() File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

    To fix this issue, refer to the "Safe importing of main module"
    section in https://docs.python.org/3/library/multiprocessing.html

self._popen = self._Popen(self) ^^^^^^^^^^^^^^^^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) ^^^^^^^^^^^^^^^^^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\popen_spawn_win32.py", line 46, in init prep_data = spawn.get_preparation_data(process_obj._name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data _check_not_importing_main() File "D:\Coding\Python Scripts.conda\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

    To fix this issue, refer to the "Safe importing of main module"
    section in https://docs.python.org/3/library/multiprocessing.html

Traceback (most recent call last): File "d:\Coding\Placebo.py", line 28, in placebo_test.fit( File "D:\Coding\Python Scripts.conda\Lib\site-packages\pysyncon\utils.py", line 185, in fit path, gap = future.result() ^^^^^^^^^^^^^^^ File "D:\Coding\Python Scripts.conda\Lib\concurrent\futures_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "D:\Coding\Python Scripts.conda\Lib\concurrent\futures_base.py", line 401, in get_result raise self._exception concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

sdfordham commented 2 months ago

I still cannot reproduce this, this stackoverflow question suggests to me it could be to do with how you are running the code https://stackoverflow.com/questions/15900366/all-example-concurrent-futures-code-is-failing-with-brokenprocesspool