signaux-faibles / predictsignauxfaibles

Dépôt du code python permettant la production de liste de prédiction Signaux Faibles.
MIT License
6 stars 2 forks source link

Feat/filter on categorical #42

Closed slebastard closed 3 years ago

slebastard commented 3 years ago

Solves #39

A data scientist can now create an instance of SFDataset using keyword arguments to filter on one or more categories. For instance:

 dataset = SFDataset(
    date_min="2016-01-01",
    date_max="2016-06-30",
    fields=["siret", "siren", "periode", "outcome","region","code_naf"],
    sample_size=500,
    regions=["Bourgogne-Franche-Comté", "Île-de-France"],
    code_naf=["C","D","E"]
)

This code will return 500 lines corresponding to SIRETs located in either Bourgogne-Franche-Comté or Île-de-France, and in one of the three sectors C, D or E.

Note that filtering on categorical variables that are not indexed in MongoDB leads to prohibitive fetching durations. This adds to the argument for either: