Dumpscript output could be tweaked to facilitate procedural generation of data

GoogleCodeExporter commented 8 years ago

Instead of the current style of output:

my_model_1 = MyModel()
my_model_1.property_1 = bob
my_model_1.property_2 = sue
...
my_model_1.save()

my_model_2 = MyModel()
my_model_2.property_1 = 'sally'
my_model_2.property_2 = 'bill'
...
my_model_2.save()

...etc.

It would be better to do:

data = [
    {'property_1': 'bob', 'property_2': 'sue'...},
    {'property_1': 'sally','property_2': 'bill'},
    ...
]

for item in data:
    obj = MyModel(**item)
    obj.save()

This would make it easy to modify the output to procedurally generate some
of the data or modify the data list when a column has been added.

(Actually at this point I have to wonder if this isn't an argument for
using JSON output and injecting it into the database vie runscript. What
were the advantages of dumpscript again???)

Original issue reported on code.google.com by andybak on 6 Mar 2009 at 3:09

GoogleCodeExporter commented 8 years ago

Hi, the point of dumpscript was that you get a human readable python script, 
which is
clear and can be edited and adapted easily. I think the above approach loses its
simplicity and readability, and is probably not how most people would write a 
script
to populate a database using django model objects.

Another important benefit (and sometimes downside) is that the entry is created 
using
python objects and their customised behaviour: auto fields are not specified,
overridden save methods are called and signals sent.

When a column is added, it wont be a problem for dumped scripts if the field 
allows
NULL or defines a default value. Or you could add a function that populates 
your new
field and attach it to django's pre_save signal when starting the script. I 
haven't
done this before, but I can't see why that wouldn't work.

Let me know if there's anything I've overlooked :-)

Original comment by e.willha...@gmail.com on 8 Mar 2009 at 1:10

Changed state: WontFix

GoogleCodeExporter commented 8 years ago

OK My alternative formulation might be imperfect but the central point was to
facilitate (in your words) "a human readable python script, which is
clear and can be edited and adapted easily"

I believe the current output could do a better job fulfilling the second part 
of that
requirement. The only way to adapt the current output is to use regex search and
replace trickery!

Original comment by andybak on 8 Mar 2009 at 1:21

GoogleCodeExporter commented 8 years ago

Thanks for the reply,

regex has been kind to me so far, but I don't know if that applies to everyone 
else.
I can't think of another way of organising the python output to do what you 
would
like it to do. Foreign keys also come into play, as each instance is given an
identifier which can be referenced later. Adding these features may cause such a
script to become more and more complicated, which isn't too fun.

I'm not sure what your problem at hand is, but would a transition script which 
links
a function populating additional fields to the pre_save signal before importing 
the
dumped script work for you? Once you have done that, you can re-dump another 
script
over the first one, and you'll have the transition script to check in to the
repository with your changes. I'm just interested if this will work, or if 
there are
some changes that could be made to dumpscript.

Cheers

Original comment by e.willha...@gmail.com on 8 Mar 2009 at 1:31

GoogleCodeExporter commented 8 years ago

OK. Here's a better version of my suggested generated dump format.

    items = []

    items += Category(
        order = 8L,
        content = None,
        summary = None,
        color = u'dc3444',
    )

    items += Category(
        order = 9L,
        content = None,
        summary = None,
        color = u'38ada5',
    )

    etc...

    [x.save() for x in items]

Advantages: I can programmatically add or process the fields before saving. 

This makes dumpscript much more useful for data migrations during development. 
For
instance - you've got a bunch of data you've entered via admin but you've 
changed the
table format or added rows.

This way you can use dumpscript to get your data out, reset all the tables and
programmatically massage it before reinserting it.

Original comment by andybak on 22 Mar 2009 at 11:08

GoogleCodeExporter commented 8 years ago

Sorry - The last comment was rushed and I didn't explain why I had raised this 
issue
again. Hopefully the code and the comments above will clarify why I think an
alternative output format for dumpscript would be useful. The stated advantages 
of
dumpscript over dumping JSON or SQL from manage.py is the flexibility and 
ability to
leverage the ORM and Python when you run the script.

The current output format makes it difficult to edit the resulting output or
programmatically alter the data before saving. I think the format about would 
make
the output from dumpscript into much more useful 'scaffolding' and doesn't 
present
any disadvantages over the current output.

Original comment by andybak on 22 Mar 2009 at 11:30

GoogleCodeExporter commented 8 years ago

There is still the problem that ForeignKey fields can only be added after a 
model has
been saved; remember that we would like to avoid specifying automatically
incremented/generated values.

If you want to do something before saving you can always use signals. But if 
you're
looking for useful scaffolding, then *maybe* we could add a save function, which
could be overridden:

    category_1 = Category()
    category_1.attribute = value
    category_1.attribute = value
    save(category)

the default definition of the function being something like:
    def save(obj):
        obj.save()

But I still think the signals approach is cleaner, because those who want to 
use it
can add it themselves

    from django.db.models import signals
    def my_handler(sender, **kwargs):
        sender.attribute = value
        sender.attribute = value
    signals.pre_save.connect(my_handler, sender=Category)

Of course this wouldn't work if you wanted to play with all the objects before 
any of
them were saved, but then you wouldn't be able to attach these objects to other
objects. What about the following:

    categories = []

    category_1 = Category()
    category_1.order = order_8
    category_1.content = None
    category_1.summary = None
    category_1.color = u'dc3444'

    categories.append(category_1)

    category_2 = Category()
    category_2.order = order_9
    category_2.content = None
    category_2.summary = None
    category_2.color = u'38ada5'

    categories.append(category_2)

    for category in categories:
        category.save()

I'm not personally fond of this (it's no longer so clean), but it probably does 
what
you're looking for.

Original comment by e.willha...@gmail.com on 22 Mar 2009 at 11:57

GoogleCodeExporter commented 8 years ago

Sorry - I completely glossed over your earlier comments on foreign keys and can 
now
see why we need a way to refer to all created objects.

We've kind of met in the middle. If you changed my suggestion so that my lists 
were
named uniquely for each model then foreign keys would work in the same way as 
yours
except you would reference them as:
categories[3]
instead of your:
categories_3

The overall effect is slightly nicer to my eyes,produces less local names and 
means
that code to modify the instances has to do less string mangling to refer to 
objects.

My only objection to the use of signals is that it isn't an immediately obvious 
way
to modify the script whereas simply deferring the saves makes it clear where 
someone
would want to begin tweaking the dumpscript output.

Original comment by andybak on 23 Mar 2009 at 9:40

GoogleCodeExporter commented 8 years ago

Yep, I think I like categories[4] better, the change shouldn't be too great.

Maybe adding a signal function stub, with the connect command commented out 
will help
new users wield the fantastic power of signals. A command line option could 
repress
this if needs be. Eg

    # If anything needs to be done on saving an object, it can be done here.
    # Edit and uncomment the relevant lines below
    def my_handler(sender, **kwargs):
        pass
    # from django.db.models import signals
    # signals.pre_save.connect(my_handler, sender=MyModel)

Thanks for your feedback!

Original comment by e.willha...@gmail.com on 23 Mar 2009 at 9:51

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Original comment by e.willha...@gmail.com on 23 Mar 2009 at 9:52

liuyang1520 / django-command-extensions

Dumpscript output could be tweaked to facilitate procedural generation of data #82