nostalgia-dev / nostalgia

Utilize your personal data like Google!
163 stars 10 forks source link

How and where do we want to provide the sample data? #11

Open kootenpv opened 4 years ago

kootenpv commented 4 years ago

Current explanation to provide it would be:

Implement a list of keys that you want to mark as anonymized.

class MySource(NDF):
    anonymized = ["account_number", "name"]

    def load(self, fname):
        ...

After successful construction, call:

my_source.create_sample_data()

It will create a .parquet file right next to the source file (and print the filename to the screen).

This file should be committed in a Pull Request as well, together with the code.

Now, we could generate documentation and do anything basically with it, as it will contain the correct types and we can even call .to_markdown() on the loaded file (pandas>=1.0.0).

@NickolayVasilishin

The question is, how will we start using it first? Where do we want to document the structure?

I'll start working on having a route for timeline that shows it in demo mode.

kootenpv commented 4 years ago

NDFs now have helpers to create and load sample data.

To load an example (now only implemented for 3 classes yet, can do the rest if this is a good way):

from nostalgia.sources.ing_banking.mijn_ing import Payments

payments = Payments.load_sample_data()

To create your own sample data:

spotify = Spotify.load()
spotify.create_sample_data()

It will produce a .parquet file next to the source. Make sure to commit this one next to the source file.

To avoid problems with privacy, you can just keep calling create_sample_data until you feel like there is no privacy leakage in the 5 examples.