guedes / faker_fdw

Because, sometimes, your Elephant must lie
Other
90 stars 7 forks source link
faker fdw postgresql

faker_fdw

faker_fdw is a foreign data wrapper for PostgreSQL that generates fake data.

What can I do with it?

You can generate data for testing, create examples for your blog, populate a development database and other things related to generate fake data.

When faker_fdw is installed, create one milion rows on a people table is as easy as doing:

devdb=> INSERT INTO public.people SELECT ssn, name, phone_number FROM faker.people;

In the snippet above, faker.people is a foreign table that returns some "person" fields, but you can do much more, like geolocation, addresses, credit card and so on.

How it works?

It's easy, once faker_fdw was installed (see below how to install and create the faker_srv), just create a foreign table with fields like ssn, name, first_name, last_name, address, phone_number and many others. The magic is done by faker-factory, a Python library that generate code through providers. For example, ssn, name, first_name, last_name are generated by person provider, while address is generated by address provider.

We'll start with ssn, name and address:

guedes=> CREATE FOREIGN TABLE fake.person (ssn varchar, name varchar, address text) 
         SERVER faker_srv OPTIONS (max_results '100');
CREATE FOREIGN TABLE
Time: 36,400 ms
guedes=> SELECT * FROM fake.person limit 10;
     ssn     |        name        |               address                
-------------+--------------------+--------------------------------------
 452-53-4113 | Jordin McClure PhD | 61875 Bernhard Lights Apt. 594      +
             |                    | Shonnamouth, FL 19690-6384
 586-60-9538 | Isabela D'Amore    | 622 Williamson Road                 +
             |                    | Schmidtside, MH 44962
 525-45-7125 | Irving Terry       | 30478 Cummings Turnpike             +
             |                    | New Chazstad, SC 67727-1963
 314-36-1089 | Janette Bradtke    | 73775 Janell Bridge Apt. 120        +
             |                    | Lonzofort, MH 88220
 382-65-0182 | Dayna Lesch        | 71914 Mosciski Fords                +
             |                    | Lake Dalechester, FM 08869-8100
 698-15-6164 | Judie Dickens      | 7634 Leuschke Burgs                 +
             |                    | West Antonio, MD 76638-0668
 870-44-9100 | Brooks Stroman     | 63236 Pfannerstill Junction Apt. 308+
             |                    | South Shea, MS 34801-5187
 652-43-1400 | Kendrick Denesik   | 0077 Runolfsdottir Cape             +
             |                    | Lake Gaynell, RI 42511
 879-63-4746 | Osie Kemmer        | 4343 Jazlynn Knoll                  +
             |                    | Lake Trueport, AL 88273
 837-30-7043 | Nyasia Smitham     | 3043 Gretta Shoal                   +
             |                    | New Haiden, UT 08099-9192
(10 rows)

Time: 17,276 ms

Now, lets suppose that you want the phone_number, that's easy, just add the column:

guedes=> ALTER FOREIGN TABLE fake.person ADD COLUMN phone_number varchar;
ALTER FOREIGN TABLE
Time: 31,200 ms
guedes=> SELECT * FROM fake.person LIMIT 10;
     ssn     |          name           |            address             |    phone_number    
-------------+-------------------------+--------------------------------+--------------------
 520-61-0046 | Miss Nan Hilll DDS      | 1162 Jaron Mill Apt. 435      +| 635.024.1809x351
             |                         | East Isham, UT 99699-8045      | 
 554-47-6145 | Ayden Jenkins DVM       | 41357 McKenzie Skyway         +| 528.396.8357
             |                         | Port Warren, NM 35237          | 
 021-55-5151 | Dr. Randolf McClure PhD | 9738 Prince Corners Suite 091 +| 696.074.2586x2173
             |                         | Harmonhaven, AK 67018          | 
 441-09-6518 | Harlene Hoppe           | 858 Lenard Port               +| 07969004580
             |                         | Kristinafurt, GU 59690         | 
 486-27-1135 | Dr. Amalie Parker       | 5581 Feil Summit Apt. 736     +| 00554249871
             |                         | West Tatiahaven, PR 12346-7661 | 
 681-31-4609 | Louis Aufderhar I       | 216 Hessel Valley Apt. 891    +| (818)010-0501x1646
             |                         | North Dorothea, RI 12275-4420  | 
 419-81-4064 | Lila Koch DVM           | 354 Fay Vista Suite 603       +| 907-143-1119
             |                         | Greenfelderburgh, MI 50297     | 
 308-50-3314 | Toby Shields            | 059 Nitzsche Parks            +| 383.998.7283x035
             |                         | East Daunte, VT 76481          | 
 405-92-2887 | Delwin Lynch            | 467 Carrol Stream Apt. 466    +| +94(9)7970982195
             |                         | Corimouth, AS 08861            | 
 651-62-6328 | Shantell Pfeffer DVM    | 60971 Nya Villages Suite 939  +| 599.631.2393
             |                         | Mamiestad, GU 54537-9334       | 
(10 rows)

Time: 19,201 ms

The data are randomly generated, so each execution even in different sessions will generate a new random dataset, but, sometimes, this is not what you want, because your tests must re-run using the same set of data. You can persist data in other tables, rather than select directly from faker tables, or you can set a seed option and use the same seed in different sessions to get the same set of data, like the following example:

guedes=> alter foreign table fake.person options ( add seed '1234' );
ALTER FOREIGN TABLE
guedes=> \c - postgres
You are now connected to database "guedes" as user "postgres".
guedes=# select * from fake.person limit 5;
     ssn     |         name          |             address             |     phone_number     
-------------+-----------------------+---------------------------------+----------------------
 165-12-0147 | Dr. Aron Lind IV      | 2564 Julius View               +| 409.956.0720x73661
             |                       | South Amon, MO 14305            | 
 160-16-9547 | Rowan Bauch           | 8474 Thompson Lights           +| 05197913028
             |                       | Schultzburgh, CT 41584          | 
 356-05-2862 | Jaydon Bogisich       | 90204 Sim River Suite 583      +| 663.781.9218x7740
             |                       | Langworthchester, FM 38547-5316 | 
 864-54-9210 | Maximillian Stamm PhD | 8213 Julie Path                +| 1-803-066-7124x72838
             |                       | Port Whitmouth, DC 03464        | 
 721-72-6607 | Foster Schimmel       | USS Ullrich                    +| (423)625-9466
             |                       | FPO AA 79263-9218               | 
(5 rows)

Time: 123.969 ms
guedes=# \c - guedes
You are now connected to database "guedes" as user "guedes".
guedes=> select * from fake.person limit 5;
     ssn     |         name          |             address             |     phone_number     
-------------+-----------------------+---------------------------------+----------------------
 165-12-0147 | Dr. Aron Lind IV      | 2564 Julius View               +| 409.956.0720x73661
             |                       | South Amon, MO 14305            | 
 160-16-9547 | Rowan Bauch           | 8474 Thompson Lights           +| 05197913028
             |                       | Schultzburgh, CT 41584          | 
 356-05-2862 | Jaydon Bogisich       | 90204 Sim River Suite 583      +| 663.781.9218x7740
             |                       | Langworthchester, FM 38547-5316 | 
 864-54-9210 | Maximillian Stamm PhD | 8213 Julie Path                +| 1-803-066-7124x72838
             |                       | Port Whitmouth, DC 03464        | 
 721-72-6607 | Foster Schimmel       | USS Ullrich                    +| (423)625-9466
             |                       | FPO AA 79263-9218               | 
(5 rows)

Time: 115.262 ms

Some fields accept options and they are passed to the respective provider. Let's suppose that you want some kind of "date and time on this year" for your person table, so you can add a field date_time_this_year like below:

guedes=# alter foreign table fake.person add column date_time_this_year timestamp;
ALTER FOREIGN TABLE
guedes=# select date_time_this_year as d from fake.person limit 5;
          d          
---------------------
 2018-07-28 02:14:46
 2018-07-08 00:22:27
 2018-12-01 09:35:49
 2018-12-31 06:18:34
 2018-02-19 16:22:58
(5 registros)

That's fine, but what if you want only datetimes after now? The date time provider supports parameters like before_now and after_now, so let's use them:

guedes=# alter foreign table fake.person alter column date_time_this_year options (after_now 'True');
ALTER FOREIGN TABLE

guedes=# select date_time_this_year as d from fake.person where date_time_this_year > now();
          d          
---------------------
 2018-09-19 21:37:21
 2018-10-10 18:19:15
 2018-12-21 10:39:42
 2018-10-16 14:04:07
 2018-12-04 19:20:27
 2018-12-28 13:18:59
 2018-09-30 11:45:49
 2018-11-13 21:58:59
 2018-10-11 05:49:17
 2018-10-23 20:47:17

 guedes=# alter foreign table fake.person alter column date_time_this_year options (before_now 'False');
ALTER FOREIGN TABLE

guedes=# select date_time_this_year as d from fake.person limit 5;
          d          
---------------------
 2018-12-26 01:16:32
 2018-11-22 05:59:49
 2018-12-14 23:18:47
 2018-11-10 10:08:14
 2018-09-17 21:38:14
(5 registros)

guedes=# select date_time_this_year as d from fake.person where date_time_this_year<now();
 d 
---
(0 registro)

Installing

You must install a few dependencies related to PostgreSQL and Python: multicorn, fake-factory then fake_fdw.

In Debian this is as easy as:

sudo apt-get install postgresql-XX-python-multicorn
sudo pip install Faker
sudo pip install https://github.com/guedes/faker_fdw/archive/v0.2.4.zip

If you are using the postgres docker image https://hub.docker.com/_/postgres

sudo apt-get install postgresql-XX-python3-multicorn
sudo pip3 install Faker
sudo pip3 install https://github.com/guedes/faker_fdw/archive/v0.2.4.zip

Once packages was installed, choose which database you want faker_fdw by typing:

CREATE EXTENSION multicorn;
CREATE SERVER faker_srv 
 FOREIGN DATA WRAPPER multicorn
 OPTIONS (wrapper 'faker_fdw.FakerForeignDataWrapper');

faker_fdw supports IMPORT SCHEMA that is used to create example tables for many providers. For some limitations all fields are created as varchar but you can use ALTER TABLE to change field type and there is no semantic between fields in the same record, that's it, each columns area independently generated.

To import an example schema:

IMPORT FOREIGN SCHEMA fake
FROM SERVER faker_srv
INTO fake
OPTIONS ( locale 'pt_BR', max_results '100');

uedes=# \det fake.
     Lista de tabelas externas
 Esquema |    Tabela    | Servidor  
---------+--------------+-----------
 fake    | address      | faker_srv
 fake    | automotive   | faker_srv
 fake    | bank         | faker_srv
 fake    | barcode      | faker_srv
 fake    | color        | faker_srv
 fake    | company      | faker_srv
 fake    | credit_card  | faker_srv
 fake    | currency     | faker_srv
 fake    | date_time    | faker_srv
 fake    | file         | faker_srv
 fake    | internet     | faker_srv
 fake    | isbn         | faker_srv
 fake    | job          | faker_srv
 fake    | lorem        | faker_srv
 fake    | misc         | faker_srv
 fake    | person       | faker_srv
 fake    | phone_number | faker_srv
 fake    | profile      | faker_srv
 fake    | python       | faker_srv
 fake    | ssn          | faker_srv
 fake    | user_agent   | faker_srv

And that's it! Have fun!

TODO

  1. extend examples and documentation;
  2. create an entire fake schema that return data with integrity between tables, as fake constraints;
  3. create a faker_fdw in C or Rust just for fun;
  4. ...

License

faker_fdw is released under the PostgreSQL License.

See LICENSE file for information.