wq / django-data-wizard

🧙⚙️ Import structured data (e.g. Excel, CSV, XML, JSON) into one or more Django models via an interactive web-based wizard
https://django-data-wizard.wq.io
MIT License
338 stars 53 forks source link

nested relational tables in a single XML file #19

Closed Wcollard closed 4 years ago

Wcollard commented 5 years ago

does django Data Wizard work with an xml file that has related tables? That is a schema that when properly imported has a at least two related tables with a one to one or more likely a one to many relationship. Is there a away to properly import all of the xml data?

sheppard commented 5 years ago

The short answer is yes - though it requires a bit of custom code. The high-level approach is:

  1. Create custom wq.io class to parse the nested XML structure.
  2. Use wq.io's flattened() utility to denormalize the structure into a flat table
  3. Create a custom Django REST Framework serializer with a nested serializer to parse the table back into a normalized schema.

Flattening the structure before importing is probably not ideal, but that is how it works for now.

Here are some (untested) examples to get you started:

<!-- example.xml -->
<root>
   <parent>
        <parent_name>Parent 1</parent_name>
        <children>
            <child>
                <child_name>Child 1</child_name>
            </child>
            <child>
                <child_name>Child 2</child_name>
            </child>
        </children>
   </parent>
   <parent>
        <parent_name>Parent 2</parent_name>
        <children>
            <child>
                <child_name>Child 3</child_name>
            </child>
        </children>
   </parent>
</root>
# myapp/wizard.py
import data_wizard
from data_wizard.loaders import FileLoader
from wq.io import XmlFileIO, BaseIO, TupleMapper, flattened
from .models import Parent, Child, XMLFile

# IO classes & loader for source model

class NestedIO(TupleMapper, BaseIO):
    pass

class MyXMLIO(XmlFileIO):
    nested = True

    def parse_item(self, el):
        data = {}
        for e in el:
            if e.tag == 'children':
                val = NestedIO(data=[self.parse_item(c) for c in e])
            else:
                val = e.text
            data[e.tag] = val
        return data

class CustomLoader(FileLoader):
    default_serializer_class = 'myapp.wizard.ChildSerializer'

    def load_io(self):
        return flattened(MyXMLIO, filename=self.file.path, inner_attr='children')

data_wizard.set_loader(XMLFile, 'myapp.wizard.CustomLoader')

# Serializers for target models

class ParentSerializer(ModelSerializer):
    class Meta:
        model = Parent
        fields = '__all__'

class ChildSerializer(ModelSerializer):
    parent = ParentSerializer()

    def create(self, validated_data):
        parent_data = validated_data.pop('parent')
        parent = ParentSerializer().create(parent_data)
        validated_data['parent'] = parent
        super().create(validated_data)

    class Meta:
        model = Child
        fields = '__all__'

data_wizard.register("Child with nested Parent", ChildSerializer)

Note that the nesting is inverted on the serializer side because the wizard is looping over each Child record together with the nested parent. This means if you have multiple children for a single parent you will need to customize create() on the ParentSerializer to make sure the parent is only created the first time. (Or you can try using NaturalKeySerializer, which is designed to address this use case).

sheppard commented 4 years ago

Closing due to inactivity.