signaux-faibles / predictsignauxfaibles

Dépôt du code python permettant la production de liste de prédiction Signaux Faibles.
MIT License
6 stars 1 forks source link

fix: DataFrame objects built in `explain` now use same index as input #76

Closed slebastard closed 3 years ago

slebastard commented 3 years ago

In function explain, the various pandas.DataFrames that are created and used to build new columns use a natural, contiguous Index. As of June 2nd 2021, we still use an index for input sf_data that can be non-contiguous, for instance because some SIRETs are removed from the dataset when there is missing data for those SIRETs. For this reason, a sf_data.data[new_field_name] = new_series would result in a critical misalignment of data, which can be very dangerous.

Altough a fix will be introduced in predictsignauxfaibles to make indexing contiguous by default, this fix forces the index of new DataFrames that are built in function explain to use the index from the original DataFrame, which solves the issue. Note that using pd.merge in this function may also solve the issue, but would make such a function heavier, thus we find a way to use direct slice copy instead