ISO-TP revamp - Githubissues

gramsay0 commented 1 year ago

Introduction

These changes are aimed at improving ISO-TP in the following areas:

Add CAN-FD support
Fix existing bugs/limitations
Reduce resource usage when communicating with a large number of devices

Problem description

Currently CAN-FD is not supported (only classical CAN)
Existing bugs/limitations:
- Cannot bind and transmit to the same CAN ID: https://discord.com/channels/720317445772017664/1121554356542640240
- Cannot bind to two devices with different extended ID: https://github.com/zephyrproject-rtos/zephyr/issues/49816
- The number of ISO-TP bindings is limited to the number of CAN hardware filters
- Each concurrent send requires an additional hardware filter
Each binding requires a separate thread (or periodic/busy-wait polling) to receive messages This may cause significant resource usage, especially when communicating with a large number of devices

Proposed change

Add ISO-TP CAN-FD support
Modify the de-muxing of CAN frames to be done in software (While keeping the hardware filters to prevent tiny end-devices from having to process irrelevant CAN frames)
Decouple bindings from packet reassembly Allowing multiple context to bind to the same address
Use a fixed number of packet reassembly contexts (configured via kconfig) This specifies the maximum number of messages that can be reassembled simultaneously (Similar to config NET_IPV4_FRAGMENT_MAX_COUNT subsys/net/ip/ipv4_fragment.c)
Allow poll/pend (k_poll()) on multiple receive contexts from a single thread
Allow binding to multiple/masked CAN TX/RD IDs Reducing the required number of receive contexts, hardware filters and k_poll() events (or threads) required to communicate with multiple devices
- Add an isotp_recvfrom() API to receive and get the actual CAN TX/RD IDs (analogous to a socket recvfrom call)
Allow transmitting using an existing binding, rather than adding a new hardware filter for every TX Reducing the required number of hardware filters Remove the ability to send without binding

Detailed RFC

ISO-TP CAN-FD support

This mostly involves conditionally adding CAN_FRAME_FDF/CAN_FILTER_FDF and can_bytes_to_dlc()/can_dlc_to_bytes() where appropriate. It would be possible to enable CAN-FD at compile time (via kconfig), and add a runtime flag to bind/send calls. However using both classical CAN and CAN-FD for ISO-TP at the same time seems unlikely, and many other ISO-TP features are already configured at compile time. So ISO-TP would be compiled to operate as either classical CAN or CAN-FD only.

Bind

isotp_bind is modified to use struct isotp_binding instead of struct isotp_recv_ctx as the binding can be used for both transmit and receive.

Additional rx_addr_mask/tx_addr_mask parameters are added. The rx_addr_mask is the CAN ID mask to be used by the hardware filter. This makes is possible to bind to either a single device or multiple devices.

struct isotp_addr_mask {
    uint32_t id_mask : 29;
    uint8_t ext_addr_mask;
};

int isotp_bind(struct isotp_binding *binding, const struct device *can_dev,
           const struct isotp_msg_id *rx_addr,
           const struct isotp_addr_mask *rx_addr_mask,
           const struct isotp_msg_id *tx_addr,
           const struct isotp_addr_mask *tx_addr_mask,
           const struct isotp_fc_opts *opts,
           k_timeout_t timeout);

Send

The send function is modified to take a binding parameter, and remove the isotp_send_ctx/can_dev/tx_addr/rx_addr parameters. It wraps a new sendto function and can only be used on 1:1 bindings where the TX address does not mask any bits, otherwise it returns an error.

static inline int isotp_send(struct isotp_binding *binding,
           const uint8_t *data, size_t len,
           isotp_tx_callback_t complete_cb, void *cb_arg);

A new sendto function is added. The TX address must be a subset of the bound TX address and TX address mask. The bindings RX address should be configured with an appropriate hardware filter to receive flow control (FC) frames.

int isotp_sendto(struct isotp_binding *binding,
           const uint8_t *data, size_t len,
           const struct isotp_msg_id *tx_addr,
           isotp_tx_callback_t complete_cb, void *cb_arg);

Note that isotp_send_ctx is removed. This should result in a simpler API with better encapsulation (and symmetric between send/receive). The implementation uses a fixed number of send contexts (configured via kconfig). This would specify the maximum number of messages that can be sent simultaneously. This functionality is actually already available with config ISOTP_ENABLE_CONTEXT_BUFFERS. This would become the default/only available TX method.

The transmit function can pend on a semaphore of available send contexts (with a timeout).

Similar modifications are made to the isotp_send_net_ctx_buf function and it is renamed isotp_send_net. isotp_send_ctx_buf/isotp_send_buf functions are removed.

Receive

The receive function is modified to take a binding parameter, and remove the isotp_recv_ctxparameter. It wraps a new recvfrom function. Usually recv would only be used on 1:1 bindings where the RX address does not mask any bits, however it is ok to use this on a 1:N binding (you just loose info on who sent the message).

static inline int isotp_recv(struct isotp_binding *binding,
           uint8_t *data, size_t len,
           k_timeout_t timeout);

A new recvfrom function is added. This is useful to get the address information when bound to multiple devices.

int isotp_recvfrom(struct isotp_binding *binding,
           uint8_t *data, size_t len,
           struct isotp_msg_id *tx_addr,
           struct isotp_msg_id *rx_addr,
           k_timeout_t timeout);

Similar changes are made to the isotp_recv_net function.

Packet reassembly

Packet reassembly is decoupled from bindings. The implementation would now use a fixed number of packet reassembly contexts (configured via kconfig). This specifies the maximum number of messages that can be reassembled simultaneously.

The reassembly contexts take care of much of what is currently managed by isotp_recv_ctx:

Keeping a reference to the can device, TX/RX address of the message
Keeping track of the receive state of the message
Handling timeouts and freeing themselves for re-use

RX frame de-mux

When a frame is received, it is either appended to an exiting reassembly context (with a matching address), or placed into one of the free reassembly contexts.

Once a full packet is received, it is added to the fifo of each binding whose RX address and RX address mask is a superset of the received address. This means that two binding can have overlapping address ranges and will both receive the same message. Care must be taken with the net buf reference counts, especially if something is unbound when a received message is pending.

TX Flow Control de-mux

A new list of active transmission contexts needs to be maintained. When a flow control (FC) frame is received, the list of active TX contexts is searched for a matching address, and the TX state updated accordingly.

Optionally, the active TX list could be checked when starting a TX, to prevent accidentally sending to the same address multiple times simultaneously.

Polling API

Add a function to initialise a k_poll receive event. This allows waiting/pending on received messages from multiple bindings from a single thread.

#define ISOTP_POLL_STATE_MESSAGE_AVAILABLE K_POLL_STATE_FIFO_DATA_AVAILABLE

static inline void isotp_recv_poll_event_init(
           const struct isotp_binding *binding, struct k_poll_event *event)
{
    k_poll_event_init(event,
            K_POLL_TYPE_FIFO_DATA_AVAILABLE,
            K_POLL_MODE_NOTIFY_ONLY,
            &binding->fifo);
}

If used correctly it should be stable even if the implementation changed to use a signal/semaphore (rather than a fifo)

Another option would be to create a polling API similar to zsock_poll(). Although I find that API a little frustrating, as you can only poll on sockets and not a mix of sockets and regular k_poll events.

Optionally, a TX writability polling API could also be made available (similar to ZSOCK_POLLOUT) isotp_send_poll_event_init(). This could pend on the TX semaphore of available send contexts

Proposed change (Detailed)

There is already quite a bit of detail above, so lets just go over some use cases. (These code samples are not expected to compile, and irrelevant parameters are replaced with ...).

A simple application that talks to a single peer:

struct isotp_binding binding;
isotp_bind(&binding, ...); // no address mask
isotp_recv(&binding, ...);
isotp_send(&binding, ...);
isotp_unbind(&binding);

An application that talks to a few peers from a single thread:

struct isotp_binding binding[NUM_PEERS];
struct k_poll_event poll_event[NUM_PEERS];

for (i = 0; i < NUM_PEERS; i++) {
    isotp_bind(&binding[i], ...); // no address mask
    isotp_recv_poll_event_init(&binding[i], &poll_event[i]);
}

// Wait for an RX message from any of the peer devices
k_poll(poll_event, NUM_PEERS, K_FOREVER);

for (i = 0; i < NUM_PEERS; i++) {
    if (poll_event[i].state == ISOTP_POLL_STATE_MESSAGE_AVAILABLE) {
        // Receive the message and send a response
        isotp_recv(&binding[i], ...);
        isotp_send(&binding[i], ...);
    }
}

for (i = 0; i < NUM_PEERS; i++) {
    isotp_unbind(&binding[i]);
}

k_poll() is reasonably efficient, with lower latency than periodic polling and much lower CPU usage than busy-waiting. However, what about a gateway/coordinator like device that talks to 100s of devices over multiple CAN buses. k_poll() inserts events into a priority ordered, doubly linked list, waits for an event and then removes the events before returning. An application like this shouldn't have to create 100s of binding, hardware filters and k_poll events just to receive all the messages being sent.

const struct device *can_dev[NUM_CAN_BUSES] = { CAN0, CAN1, ... };
struct isotp_binding binding[NUM_CAN_BUSES];
struct k_poll_event poll_event[NUM_CAN_BUSES];

for (i = 0; i < NUM_CAN_BUSES; i++) {
    isotp_bind(&binding[i], can_dev[i], ...); // full address mask
    isotp_recv_poll_event_init(&binding[i], &poll_event[i]);
}

// Wait for an RX message from any device on any CAN bus
k_poll(poll_event, NUM_CAN_BUSES, K_FOREVER);

for (i = 0; i < NUM_CAN_BUSES; i++) {
    if (poll_event[i].state == ISOTP_POLL_STATE_MESSAGE_AVAILABLE) {
        // Receive the message + address info and send a response
        isotp_recvfrom(&binding[i], ..., &addr, ...);
        isotp_sendto(&binding[i], ..., &addr, ...);
    }
}

for (i = 0; i < NUM_CAN_BUSES; i++) {
    isotp_unbind(&binding[i]);
}

Dependencies

These are breaking API changes

Concerns and Unresolved Questions

I may have completely overlooked something

Alternatives

Use callbacks rather than recv/recvfrom
Move ISO-TP to the network/sockets APIs (as done in Linux): int sock = zsock_socket(AF_CAN, SOCK_DGRAM, IPPROTO_CAN_ISOTP);
Attempt to make these changes without breaking the existing API

CC: @martinjaeger @henrikbrixandersen @alexanderwachter @abferm

towen commented 1 year ago

Hi Grant, Thanks a lot for this, it's a very detailed and useful RFC. I agree with everything you say. It's an important feature for my use case, as I would like to be able to use ISO-TP over CAN-FD, and will need to be able to both receive from and transmit to the same address. We will also want to be able to bind to a lot of devices at once.

So ISO-TP would be compiled to operate as either classical CAN or CAN-FD only.

I actually have an application where we would use ISO-TP over CAN-FD on one interface, but would like to mix classic-CAN frames and CAN-FD on the other. (Although ISO-TP would not be used with Classic CAN - we would use a different protocol for communicating with classic CAN devices). I think from your comments this would still work?

martinjaeger commented 1 year ago

There is also PR #56960 which contains some useful improvements for ISO-TP.

martinjaeger commented 1 year ago

Hey @gramsay0.

Finally found the time to look at your RFC. Your suggestions look very well-thought-out. Looking at the ISO-TP code I was also thinking of combining the isotp_send_ctx and isotp_recv_ctx into one isotp_ctx. This would be your isotp_binding, if I understand correctly.

@towen and I are working on the same project and need at least the support for CAN-FD and the combined send/receive context so that we can keep listening to incoming messages while sending with the same IDs.

Have you already started with implementation? Do you have a suggestion how we could support you? Maybe the CAN-FD support would be the lowest-hanging fruit?

gramsay0 commented 1 year ago

Hi Grant, Thanks a lot for this, it's a very detailed and useful RFC. I agree with everything you say. It's an important feature for my use case, as I would like to be able to use ISO-TP over CAN-FD, and will need to be able to both receive from and transmit to the same address. We will also want to be able to bind to a lot of devices at once.

So ISO-TP would be compiled to operate as either classical CAN or CAN-FD only.

I actually have an application where we would use ISO-TP over CAN-FD on one interface, but would like to mix classic-CAN frames and CAN-FD on the other. (Although ISO-TP would not be used with Classic CAN - we would use a different protocol for communicating with classic CAN devices). I think from your comments this would still work?

@towen thanks for reviewing this. Yes, I believe the scenario you mentioned would still work. I have actually been further considering whether CAN-FD should be compile time or runtime. Configuring at runtime adds very little overhead, increases flexibility and makes testing easier. Although there isn't really a way add these parameters without breaking the existing API, and the parameters would need to be passed to both bind and send functions

gramsay0 commented 1 year ago

@martinjaeger also thanks!

Looking at the ISO-TP code I was also thinking of combining the isotp_send_ctx and isotp_recv_ctx into one isotp_ctx. This would be your isotp_binding, if I understand correctly.

Not quite, the isotp_send_ctx and isotp_recv_ctx's would be statically allocated as array/mem_slab's. When a SF/FF frame is received, it would be allocated a recv_ctx which would link to a isotp_binding (using the address + mask). When the whole packet is received, calling recv on the isotp_binding would return the data from the recv_ctx and free the buffers and recv_ctx for re-use. When a packet is sent it would be allocated a send_ctx which may or may not need to keep track of the isotp_binding. This means you could have multiple transmits in progress on the same binding.

In the RFC I mentioned allowing overlapping addresses in bindings, however I now think that is a bad idea. e.g. If a received frame matched multiple bindings, then which bindings flow control parameters would be used?

Have you already started with implementation? Do you have a suggestion how we could support you? Maybe the CAN-FD support would be the lowest-hanging fruit?

I've been mostly scoping out the work to ensure it is viable and meets existing use cases. I could try draft up an implementation for some early review/feedback? Also, I'm not quite sure how breaking API changed are handled in this project, maybe they have a different acceptance criteria to regular fixes/improvements? Yes, I'll talk to my colleague about getting a CAN-FD PR opened. It was a little bit more work than anticipated, but I think it just needs some unit tests now.

There is also PR https://github.com/zephyrproject-rtos/zephyr/pull/56960 which contains some useful improvements for ISO-TP.

Thanks, I'll take a look

gramsay0 commented 1 year ago

@martinjaeger I've opened a PR for ISO-TP CAN-FD support: https://github.com/zephyrproject-rtos/zephyr/pull/60323

martinjaeger commented 1 year ago

I've been mostly scoping out the work to ensure it is viable and meets existing use cases. I could try draft up an implementation for some early review/feedback?

That sounds like a great idea. Just start with the API parts and open a draft PR so that others can have an early look even if it's still work in progress.

Also, I'm not quite sure how breaking API changed are handled in this project, maybe they have a different acceptance criteria to regular fixes/improvements?

As the ISO-TP API is still marked EXPERIMENTAL, we are quite free to improve it even with breaking changes. We should just mention the changes in the release notes. The API stabilization process is laid out here: https://docs.zephyrproject.org/latest/develop/api/api_lifecycle.html#api-lifecycle

Thanks also for opening the CAN-FD PR! Will try to have a look asap.

gramsay0 commented 1 year ago

Some additional thoughts after further reviewing the existing implementation:

isotp_recv() may return partial packets with no indication of where a datagram starts/ends The data would need its own framing on top of ISO-TP (isotp_recv_net() however does indicate this information) I think these functions should only return completed packets Although this would make receiving very large packets unfeasible
Currently the recv() functions return an error for a failed reassembly Maybe this could instead be done via an ISO-TP "stats" API, similar to many other Zephyr drivers etc
Currently the ISRs (CAN RX and timeouts) modify state variables for TX/RX without locking This seems prone to race conditions The ISRs should probably just post events to the work queue This would likely make the code easier to understand and extend too
Maybe the flow control (FC) block size (BS) parameter should be a maximum limit e.g. if bs is 8, and you only have buffers available for 6 blocks, why not allocate and receive those 6 blocks immediately?

Other notes:

overlapping addresses in bindings should not be supported (as mentioned in a comment above) This would make it ambiguous as to which bindings flow control parameters to use
When binding to multiple/masked TX/RD IDs, obtaining the CAN ID to send a flow control (FC) reply to is difficult Unless using a fixed addressing mode (SAE J1939) For other cases this might require a callback or list of masks/shifts to convert an RX CAN ID to a TX ID
binding to multiple/masked CAN TX/RD IDs would make snooping really easy All that would be needed is an option to disable sending FC frames

gramsay0 commented 1 year ago

I've decided to use a custom ISO-TP implementation for the project I'm working on, so closing this for now.

Also, another method I was considering was splitting the ISO-TP module into two parts:

A callback based part handling everything ISO-TP
A layer on top of that that handles buffers, CAN filters, user APIs (bind/send/recv) and anything else that is not strictly ISO-TP

That may be a good approach, rather than trying to make a one-size-fits-all API

martinjaeger commented 1 year ago

@gramsay0 Sad to hear that you are going to use a custom ISO-TP implementation. I guess it will still be based on Zephyr? Is there any chance it could be open-sourced once you are done? If it fixes the above shortcomings, we could even think about replacing the current in-tree implementation.

gramsay0 commented 1 year ago

@gramsay0 Sad to hear that you are going to use a custom ISO-TP implementation. I guess it will still be based on Zephyr? Is there any chance it could be open-sourced once you are done? If it fixes the above shortcomings, we could even think about replacing the current in-tree implementation.

Our custom implementation will be reduced/optimised down to only meet our specific requirements, so not useful as a generic ISO-TP implementation

alexanderwachter commented 1 year ago

I'm in favor of the callback idea! Some lightweight callback context, a bind function and a callback with something like

void foo(const iso_tp_ctx* ctx, const uint8_t *data, size_t len, offs_t offset)
{}

martinjaeger commented 1 year ago

Re-opened because I've started to work on a few of the issues.

zephyrbot commented 9 months ago

Hi @henrikbrixandersen,

This issue, marked as an Enhancement, was opened a while ago and did not get any traction. It was just assigned to you based on the labels. If you don't consider yourself the right person to address this issue, please re-assing it to the right person.

Please take a moment to review if the issue is still relevant to the project. If it is, please provide feedback and direction on how to move forward. If it is not, has already been addressed, is a duplicate, or is no longer relevant, please close it with a short comment explaining the reason.

@gramsay0 you are also encouraged to help moving this issue forward by providing additional information and confirming this request/issue is still relevant to you.

Thanks!

zephyrproject-rtos / zephyr

ISO-TP revamp #59896

Introduction

Problem description

Proposed change

Detailed RFC

ISO-TP CAN-FD support

Bind

Send

Receive

Packet reassembly

RX frame de-mux

TX Flow Control de-mux

Polling API

Proposed change (Detailed)

Dependencies

Concerns and Unresolved Questions

Alternatives