In function explain, the various pandas.DataFrames that are created and used to build new columns use a natural, contiguous Index.
As of June 2nd 2021, we still use an index for input sf_data that can be non-contiguous, for instance because some SIRETs are removed from the dataset when there is missing data for those SIRETs.
For this reason, a sf_data.data[new_field_name] = new_series would result in a critical misalignment of data, which can be very dangerous.
Altough a fix will be introduced in predictsignauxfaibles to make indexing contiguous by default, this fix forces the index of new DataFrames that are built in function explain to use the index from the original DataFrame, which solves the issue.
Note that using pd.merge in this function may also solve the issue, but would make such a function heavier, thus we find a way to use direct slice copy instead
In function
explain
, the variouspandas.DataFrames
that are created and used to build new columns use a natural, contiguous Index. As of June 2nd 2021, we still use an index for input sf_data that can be non-contiguous, for instance because some SIRETs are removed from the dataset when there is missing data for those SIRETs. For this reason, asf_data.data[new_field_name] = new_series
would result in a critical misalignment of data, which can be very dangerous.Altough a fix will be introduced in
predictsignauxfaibles
to make indexing contiguous by default, this fix forces the index of new DataFrames that are built in functionexplain
to use the index from the original DataFrame, which solves the issue. Note that usingpd.merge
in this function may also solve the issue, but would make such a function heavier, thus we find a way to use direct slice copy instead