xoolive / traffic

A toolbox for processing and analysing air traffic data
https://traffic-viz.github.io/
MIT License
361 stars 80 forks source link

Flight split based on condition #457

Closed xoolive closed 2 months ago

junzis commented 2 months ago

Can we get this method into LazyTraffic?

For example, the following code through error: Method 'split' not implemented on Traffic or LazyTraffic

t = (
    t.iterate_lazy(iterate_kw=dict(by="6 hours"))
    .assign_id(name="{self.callsign}_{idx:>04}")
    .filter()
    .longer_than("1 hour")
    .pipe(enough_data)
    .split("30min", condition=no_split_enroute)
    .pipe(full_flight)
)
xoolive commented 2 months ago

The proper way to add a method to LazyTraffic is to add a decorator to a method without an implementation (use the ellipsis ...) in the Traffic class, like everything below this line:

https://github.com/xoolive/traffic/blob/a83e95bfd7dcbf09b469587f7ca6b0297884a61a/src/traffic/core/traffic.py#L736

Note that the decorator will look for a method in Flight with the same name:

https://github.com/xoolive/traffic/blob/a83e95bfd7dcbf09b469587f7ca6b0297884a61a/src/traffic/core/lazy.py#L341

Lazy methods

At the moment only two main categories of Flight methods are supported:

The most important category that is not supported as is are methods returning a FlightIterator: since there can be many segments returned, this would break the linearity of the stack.

Instead of

flight:
|_ method1 -> bool
|_ method2 -> None | Flight
|_ method3 -> None | Flight

It could create a tree of executions, with possible RecursionError associated, and without the possibility to merge branches later:

flight
|_ method1 -> bool
|_ method2 -> FlightIterator
   |_ [segment1]
   |  |_ method3 -> FlightIterator
   |     |_ ...
   |_ [segment2]
   |  |_ method3 -> FlightIterator
   |     |_ ...

A common use case with FlightIterator however is to use an aggregator right behind (i.e. a function FlightIterator -> None | bool | Flight)

That pattern is supported in LazyTraffic, with many aggregators such as has, next, or all, they also support methods being passed as a string to avoid passing lambdas which cannot be serialized (and multiprocessed), that's why instead of:

flight.aligned_on_ils("EHRD").next()  # None | Flight
traffic.aligned_on_ils("EHRD").next().eval()  # INVALID code
        ^__ returns a FlightIterator
                               ^__ brings the FlightIterator back to a Flight

You would write:

traffic.next('aligned_on_ils("EHRD")').eval()
        ^__ next returns an Optional[Flight]

On your example

t = (
    t.iterate_lazy(iterate_kw=dict(by="6 hours"))
    .assign_id(name="{self.callsign}_{idx:>04}")  # you assign a flight_id
    .filter()
    .longer_than("1 hour")
    .pipe(enough_data)
    .split("30min", condition=no_split_enroute)  # this returns a FlightIterator
    .pipe(full_flight)
)

The problem here that needs clarification is that when you split the trajectories, all the pieces will have the same flight_id. I am not sure if it is what you want. You may have wanted to put the assign_id() at the end of the stack? It feels error-prone though. I am also not sure what the full_flight method does.

That's the reason I am not very sure if I want to open the Pandora box with lazy flight iterators, because of this kind of situation.

Workaround

There is however an example of a workaround in the following example, with the all() aggregator, followed by a .drop(columns=['flight_id']) or here where a new flight_id in generated in the all() aggregator.

For your situation, I am not sure:

junzis commented 2 months ago

This is what I came up with in the end. Not pretty, but it work well.

def no_split_enroute(f1: Flight, f2: Flight) -> bool:
    return f1.data.iloc[-1].altitude < 25000 or f2.data.iloc[0].altitude < 25000

def resample_filter_split(flight: Flight) -> None | Flight:
    flight_ = Traffic.from_flights(
        [f.resample("2s").filter() for f in flight.split("10min")]
    )[0]  # 🌶️  I would recommend writing it Flight(pd.concat([...])) to avoid confusion with the [0] here

    return flight_.split("30min", condition=no_split_enroute).all(
        "{self.flight_id}_{i}"
    )  # 🌶️ This would be a perfect use case for flight_i.all('split("30min", condition=no_split_en_route)', flight_id="{self.flight_id}_{i}")

def full_flight(flight: Flight) -> None | Flight:  # 🌶️  return a bool
    if (  # 🌶️ replace 'if' with 'return'
        flight.data.altitude.iloc[0] < 5000
        and flight.data.altitude.iloc[-1] < 5000
        and flight.altitude_max > 15000
    ):
        return flight  # 🌶️ remove
    return None  # 🌶️  remove

 t = (
      t.iterate_lazy(iterate_kw=dict(by="6 hours"))
      .assign_id(name="{self.callsign}_{idx:>04}")
      .filter()
      .longer_than("1 hour")
      .pipe(resample_filter_split)
      .pipe(full_flight)
      .eval(max_workers=8, desc="processing")
  )
xoolive commented 2 months ago

Commenting directly in your code above

xoolive commented 2 months ago

You can now replace

flight_ = Traffic.from_flights(
        [f.resample("2s").filter() for f in flight.split("10min")]
    )[0] 

with

flight_ = flight.split('10min').map(lambda f: f.resample('2s').filter()).all()
junzis commented 2 months ago

Yes, it works! I think it is good to be merged :+1: