Benchmarking our Data - Githubissues

michaelweinold commented 8 months ago

...we could use publicly available ADSB data, like suggested by Aviation II lecturer Theo Rindlisbacher: https://doi.org/10.5194/acp-24-725-2024

michaelweinold commented 6 months ago

...this seems like the best option to check coverage:

arebe337 commented 6 months ago

@michaelweinold @dodedic I reviewed how we could best benchmark our data from the AeroDataBox.

I obtained an Excel table from Flightradar24 for the exact period during which we have our own data from the API. From Flightradar24, I used the data for commercial flights only (commercial passenger flights, cargo flights, charter flights, and some business jet flights). This was because the total flights category includes many other types that are likely not tracked by AeroDataBox, such as gliders, most helicopter flights, most ambulance flights, government flights, some military flights, and drones.

For plotting, I used the 7-day average since it matches the data we get from the AeroDataBox API. When comparing the plots, the overall curves look very similar. The main difference arises from the fact that with the API, we only have data for one week per month, compared to the daily data from Flightradar24. Overall, there seems to be a difference of around 20,000 flights between the two sources, likely because our API "only" tracks 3,144 airports, whereas Flightradar24 covers all active airports

If you have any additional ideas on how we could adjust the graph to better display this, I would love to hear them. (the plot is starting in May due to the Data we got from the API)

dodedic commented 6 months ago

The fact that the trends match is a good sign I would say. I just had thought and had a look at the Excel file you mentioned and unfortunately they make no mention of how many of those flights are purely commercial passenger flights. Do you think we could send an e-mail and ask about that statistic? Then we would have a great way of verfiying the ADB data.

arebe337 commented 6 months ago

Yes, that's the same problem I noticed when examining the data. I think it could be a good idea to write them! Didn't you already contact them? If so, you could also inquire about this when receiving a response.

arebe337 commented 6 months ago

@michaelweinold

michaelweinold commented 6 months ago

commercial flights only (commercial passenger flights, cargo flights, charter flights, and some business jet flights) https://github.com/sustainableaviation/demandmap/issues/11#issuecomment-2114328750

@arebe337, please update the Excel file in figures/benchmarking_data/FlightRadar_data.xlsx. The file must contain information on where the data comes from (in this case, at least a URL of the source). This brings my to another point: What exactly do the filters look like? Is it really just commercial? We could, for instance, discard corporate travel (business jets). But I guess this is what @dodedic already clarified:

and unfortunately they make no mention of how many of those flights are purely commercial passenger flights. https://github.com/sustainableaviation/demandmap/issues/11#issuecomment-2114787923

Another question:

The main difference arises from the fact that with the API, we only have data for one week per month, compared to the daily data from Flightradar24. https://github.com/sustainableaviation/demandmap/issues/11#issuecomment-2114328750

What am I missing here, @arebe337 & @dodedic? I though the API gave you coverage of the entire year, if we just query the number of connections (not the type of aircraft)? What does "one week per month" mean?

Finally:

If you have any additional ideas on how we could adjust the graph to better display this, I would love to hear them. (the plot is starting in May due to the Data we got from the API).

I think it will be difficult/impossible to add more resolution (flight category, etc.) to global statistics like the one you have already found from FR24. What I would suggest instead is to look at the our case-study routes in particular. There, we can directly compare FR24 and ADB data! Doing this for eg. 1 week is feasible even without an FR24 API.

michaelweinold commented 6 months ago

@dodedic, benchmarking data is now scattered across different issues and PRs of the repo.

For instance:

California calculations are mentioned in https://github.com/sustainableaviation/demandmap/pull/36#issue-2317258793
Nigeria calculations are mentioned in https://github.com/sustainableaviation/demandmap/issues/30#issue-2298572973

Unfortunately, I won't have the time to go through your matrix generation code in detail (also because the code is currently not documented).

Could you fill in the table below, so I can add it to the poster?

Case Study	3rd Party Source	FR24	ADB (our data)
California
Nigeria.
S.Africa

dodedic commented 6 months ago

Case Study	3rd Party Source	FR24	ADB (our data)
California	OAG	-	x
Australia	OAG	-	x
Nigeria	OAG	x	x

michaelweinold commented 6 months ago

...ok, maybe I had misunderstood - I thought we would look up benchmarking data for all three case study routes? Can you do a quick FR24-based check for the two remaining routes (California, Australia)?

It would be great to have all the numbers as a comparison for the poster - maybe collect them in an Excel sheet, with your calculation assumptions?

dodedic commented 6 months ago

Ah I see the misunderstanding now! I can do that tomorrow, no problem!

dodedic commented 6 months ago

@michaelweinold @arebe337

So here is the Excel with the calculations I made.

One major point I have is that we didn't actually benchmark against FR24 in-depth, since they do not give us historical data and ABD-Exchange never answered my request for a data dump on our routes. The only thing I did was a quick sanity check on this issue under "4. Estimate current and future demand of air travel route between both cities", where I looked at the different types of aircraft that fly that route on FR24 and averaged those numbers to compare against our ADB-data. This was just a quick sanity check.

With this not possible with FR24 on a bigger scale, I turned to Sabre/OAG to provide us with their seats/year data and did benchmarking of our ADB-data on our routes against their numbers. You can find the calculations in the Excel sheet. With the USA route, the numbers match quite well, while for Nigeria and Australia we are off. We might have to put some more thought into this in the next week...

arebe337 commented 6 months ago

@dodedic I can't open the excel sheet

dodedic commented 6 months ago

Maybe with this one?

Otherwise it's just in the OneDrive folder where the airport data also is!

dodedic commented 5 months ago

@michaelweinold @arebe337

I have made some further caculations on the topic of available seats on our routes. On FR24, if you go to the airports page e.g. https://www.flightradar24.com/data/airports/lax and then look for all the airports we are looking at in our case study you will find the number of flights per week to all destinations just publicly there.

If we look at both directions for each city pair and then divide by seven we get the number of daily flights from FR24. I have added them to the Data Calculations Excel above. Here is what I found:

This suggests that the AAS matrices we made look quite okay, and the problem is in fact with the flights/day metric. In my opinion the discrepancy comes maybe from the fact that some or all cargo flights are included the FR24/OAG data. I will investigate further...

dodedic commented 5 months ago

@michaelweinold @arebe337

So to look further into the benchmarking of the ADB-data I now looked at all scheduled flights that are available on FR24 from all the case study routes to see if that would make a difference and put it in the blue "FR24 Daily" row.

I still got the same numbers as from the airport page (example MEL) and additionally, all flights that are there are PAX flights meaning that it can't be down to FR24 counting cargo flights on those routes.

What I did now is message ADB about this, so perhaps they know why we are encountering this "issue" on only these two routes (forgot to CC you):

sustainableaviation / demandmap

Benchmarking our Data #11