jolpica / jolpica-f1

Apache License 2.0
41 stars 1 forks source link

Create Decoder for objects retrieved from pdf parsing #28

Open jolpica opened 5 months ago

jolpica commented 5 months ago

It's currently impractical for data parsed from PDFs to be imported to the Database. A json-like (using python objects) format should be chosen, and then a decoder created.

Format

A format similar to the below should be used, allowing the data parsers to reference identifying info like year, round number, team and driver without worrying about the database structure

objects = [
    {
        "object_type": "Lap",
        "foreign_keys": {
            "session_entry": {
                "round_number": 1,
                "season_year": 2023,
                "team": "Red Bull",
                "driver": "Max Verstappen",
            }
        },
        "objects": [
            {
                "number": 1,
                "position": 1,
                "time": timedelta(milliseconds=1245),
            },
            {
                "number": 2,
                "position": 1,
                "time": timedelta(milliseconds=2245),
            },
        ],
    },
    ...
]

The decoder should be able to create Django objects with the correct foreign keys given a list of arbitrary objects that follow the database structure.

harningle commented 2 months ago

Shall we fix the notation/naming in the json? E.g., in the example above, I find it more straightforward to use lap_number than number. number seems very vague: it can be car number, can be lap number, etc. From coding perspective, it makes no difference, but better naming improves readability.

My suggestion is:

If we name the json in this way, you will have to rename/map them before you insert them to the database. What's you opinion? If you give me green light I can start to work on the json formatting and send them to you

jolpica commented 2 months ago

I don't have a strong opinion either way, I'll leave it up to your judgement, as it'd be great to get this finalised.

Changing the names would make it more human readable, but this format isn't really intended for himans to read. Also if we change the names of the fields that would mean we'd need to maintain a list of field mappings to be able to parse the data.

I don't think either of reasons above would rule it out though. Also the format above is out of date as we discussed using car number instead of driver name/team name combinations.

harningle commented 1 month ago

yes we are using car numbers.

this format isn't really intended for himans to read

Totally agree. The parsing code may use whatever names that are readable, but the final json will use exactly the same field names as the database, so there is no additional work on field naming mapping. I will send you the json formats by next week ideally