jpatokal / openflights

Website for storing flight information, rendering paths on a zoomable world map and calculating statistics, with plenty of free airline, airport and route data.
http://openflights.org
GNU Affero General Public License v3.0
1.38k stars 388 forks source link

Import issues #1197

Open reedy opened 1 year ago

reedy commented 1 year ago

On my dev install... I deleted all my flights (using the UI) and then imported a newer backup from openflights.org

It seems to have resulted in some weird plane type attributions, even though the exported CSV looks good

Screenshot 2023-04-23 at 18 49 28

Should this be showing IDs or types?

Screenshot 2023-04-23 at 18 50 44

reedy commented 1 year ago

I doubt the fix to https://github.com/jpatokal/openflights/issues/1116 is to blame....

reedy commented 1 year ago

But I'm guessing something to do with differing plane ids locally..?

mysql> select * from planes;
+-------------------------+------+-------+------+--------+
| name                    | abbr | speed | plid | public |
+-------------------------+------+-------+------+--------+
| Airbus A320             | NULL |  NULL |    1 | NULL   |
| Boeing 777-300ER        | NULL |  NULL |    2 | NULL   |
| Boeing 737-MAX9         | NULL |  NULL |    3 | NULL   |
| Boeing 737-800          | NULL |  NULL |    4 | NULL   |
| Embraer ERJ175          | NULL |  NULL |    5 | NULL   |
| Airbus A319             | NULL |  NULL |    6 | NULL   |
| Airbus A321NEO          | NULL |  NULL |    7 | NULL   |
| Embraer E175            | NULL |  NULL |    8 | NULL   |
| Airbus A321             | NULL |  NULL |    9 | NULL   |
| Airbus A350-900         | NULL |  NULL |   10 | NULL   |
| Airbus A330-300         | NULL |  NULL |   11 | NULL   |
| Embraer E190            | NULL |  NULL |   12 | NULL   |
| Airbus A320NEO          | NULL |  NULL |   13 | NULL   |
| Embraer 190             | NULL |  NULL |   14 | NULL   |
| Boeing 777-200          | NULL |  NULL |   15 | NULL   |
| Boeing 787-8            | NULL |  NULL |   16 | NULL   |
| BAE Jetstream J41       | NULL |  NULL |   17 | NULL   |
| Airbus A380             | NULL |  NULL |   18 | NULL   |
| Embraer E190SR          | NULL |  NULL |   19 | NULL   |
| Bombardier CRJ-900      | NULL |  NULL |   20 | NULL   |
| Boeing 737-700          | NULL |  NULL |   21 | NULL   |
| Boeing 747-400          | NULL |  NULL |   22 | NULL   |
| Embraer RJ145           | NULL |  NULL |   23 | NULL   |
| Airbus A340-600         | NULL |  NULL |   24 | NULL   |
| Boeing 777-200ER        | NULL |  NULL |   25 | NULL   |
| Bombardier CRJ-1000     | NULL |  NULL |   26 | NULL   |
| Airbus A350-1000        | NULL |  NULL |   27 | NULL   |
| Boeing 767-300          | NULL |  NULL |   28 | NULL   |
| Airbus A330-200         | NULL |  NULL |   29 | NULL   |
| Boeing 757-200          | NULL |  NULL |   30 | NULL   |
| Boeing 787-9            | NULL |  NULL |   31 | NULL   |
| Boeing 777-200LR        | NULL |  NULL |   32 | NULL   |
| Airbus A318             | NULL |  NULL |   33 | NULL   |
| McDonnell Douglas MD-83 | NULL |  NULL |   34 | NULL   |
| Boeing 767-300ER        | NULL |  NULL |   35 | NULL   |
| Embraer E170            | NULL |  NULL |   36 | NULL   |
| Canadair CRJ-700        | NULL |  NULL |   37 | NULL   |
| Canadair CRJ-900        | NULL |  NULL |   38 | NULL   |
| McDonnell Douglas MD83  | NULL |  NULL |   39 | NULL   |
| Boeing 787-800          | NULL |  NULL |   40 | NULL   |
| Bombardier Q400         | NULL |  NULL |   41 | NULL   |
| Boeing 737-400          | NULL |  NULL |   42 | NULL   |
| Boeing 737-300          | NULL |  NULL |   43 | NULL   |
| Airbus A321T            | NULL |  NULL |   44 | NULL   |
| Airbus A320-200         | NULL |  NULL |   45 | NULL   |
| Avro RJ85               | NULL |  NULL |   46 | NULL   |
| Dornier J328            | NULL |  NULL |   47 | NULL   |
| Fokker 100              | NULL |  NULL |   48 | NULL   |
| Airbus A321-200         | NULL |  NULL |   49 | NULL   |
| ATR 72                  | NULL |  NULL |   50 | NULL   |
| Saab 2000               | NULL |  NULL |   51 | NULL   |
| McDonnell Douglas MD80  | NULL |  NULL |   52 | NULL   |
| Canadair CRJ-200        | NULL |  NULL |   53 | NULL   |
| Embraer ERJ-145         | NULL |  NULL |   54 | NULL   |
| Boeing 767-200          | NULL |  NULL |   55 | NULL   |
| McDonnell Douglas MD82  | NULL |  NULL |   56 | NULL   |
| Fokker 70               | NULL |  NULL |   57 | NULL   |
| BAe 146-200             | NULL |  NULL |   58 | NULL   |
| Boeing 767              | NULL |  NULL |   59 | NULL   |
+-------------------------+------+-------+------+--------+
59 rows in set (0.00 sec)

If I truncate the table, then reimport... They all come in empty...

reedy commented 1 year ago

Caused by https://github.com/jpatokal/openflights/commit/2d825118307d17e34b68dd99f3cceea1b605101b as the plane id (plid) is actually being exported?

chrisrosset commented 1 year ago

But I'm guessing something to do with differing plane ids locally..?

mysql> select * from planes;
+-------------------------+------+-------+------+--------+
| name                    | abbr | speed | plid | public |
+-------------------------+------+-------+------+--------+
| Airbus A320             | NULL |  NULL |    1 | NULL   |

snip

| Boeing 767              | NULL |  NULL |   59 | NULL   |
+-------------------------+------+-------+------+--------+
59 rows in set (0.00 sec)

I notice that planes.dat does not contain a plid column. With the live prod database having been updated multiple times, I don't think we can easily replicate the IDs ourselves.

chrisrosset commented 1 year ago

Caused by 2d82511 as the plane id (plid) is actually being exported?

@reedy, I've looked into this today and I don't think this is a bug.

The export/backup functionality will export the database IDs for multiple entities:

Data CSV column name
Origin Airport From_OID
Destination Airport To_OID
Airline Airline_OID
Plane Type Plane_OID

When present in the upload payload, this allows the import code to easily match entities even if they were renamed or otherwise updated between the backup creation and upload. Importantly, during parsing, these take priority over the string columns (e.g. airline, plane) [^1].

airports.dat and airlines.dat both contain an ID column which load-data.sql uses when seeding the database with data. This makes the IDs consistent between the real openflights.org and dev setups.

However, planes.dat does not have an ID column. As a result, your local instance with either fail to match planes or match them incorrectly.

The solution is to wipe the Plane_OID column (best to wipe all the OID columns) in your CSV before doing an import on the local instance. I've tested this with my dev setup.

The data files have not been updated in a while (7 years for airlines.dat, 4 years for both planes.dat and airports.dat) so some entries are marked as "No matches, will be added as new" (e.g. Wizz Air UK).

[^1]: For example, https://github.com/jpatokal/openflights/blob/2f86a25d09c0721a2b37aa464bd7c474244fdeab/php/import.php#L497-L505 uses the Airline ID if present and otherwise falls back to string matching.

2sylbl commented 1 year ago

See my comment on 992 - https://github.com/jpatokal/openflights/pull/992#issuecomment-1602756677

Users are allowed to enter their own plane designations so I’m curious as to how this might play out here as well. They may or may not confirm to existing ids, even if there were an Id column.

chrisrosset commented 1 year ago

It doesn't change the situation much here. I only mentioned the OIDs matching as an explanation for why this only happened to planes. You should wipe all the OIDs when moving backups across database instances.

For a local install, you have to clear that column (and the other OID columns as well). The import process will automatically create new entities for you if it can't match them based on the name.

reedy commented 1 year ago

I note after deleting the databases, the updated imports etc.. Then importing my latest backup from openflights.org, the planes are wrong or mostly missing

chrisrosset commented 1 year ago

You need to delete the OIDs from the CSV when importing across instances to get the new instance to create whatever is not present in the database - try it. This only affects people moving data across instances.

reedy commented 1 year ago

Aha, thanks. I wonder if this should almost be a difference between "backup" and "export"...

Probably should be documented somewhere (on the site) too

chrisrosset commented 1 year ago

Aha, thanks. I wonder if this should almost be a difference between "backup" and "export"...

I think exporting them is correct because the names can (and do, even if infrequently) change while the IDs are stable. This helps if your backup is not recent. I think this is WAD. Let's be honest, either you're using openflights.org or you're running your own instance. I think this is only an issue for us. :)

Probably should be documented somewhere (on the site) too

I'm not sure about the site itself since I don't believe this applies to users of openflights.org. How about the repo docs? You're probably going to read these if you're running your own instance.