Closed xiaoliang314 closed 3 years ago
@xiaoliang314, settings_save_one
can indeed take a long time to complete, could you add some more information about the use case (e.g. the used backend, the backend storage size, ...).
@Laczen Thank you for your reply!
Configuration
@xiaoliang314, could it be that your storage needs are approaching 4kB?
@Laczen My storage needs are less than 1KB. But the storage frequency is faster, about once every 4 seconds.
@xiaoliang314, with this configuration and requirements you should not be getting a few seconds to do a settings_save_one
. The worst case should be when a sector erase is needed. Your storage/settings area is also not polluted with settings that are no longer needed? As a test could you erase the storage area and perform the test again?
@Laczen Yes, it is usually not so fast in normal use. This is just one of my test cases. I have an application. I received data from the hardware in the interrupt handler and placed it in a high-priority thread for processing. Because the work queue was doing settings_save_one, my thread did not process the data and release the buffer in time, resulting in data loss.
@xiaoliang314, with this configuration and requirepents you should not be getting a few seconds to do a
settings_save_one
. The worst case should be when a sector erase is needed. Your storage/settings area is also not polluted with settings that are no longer needed? As a test could you erase the storage area and perform the test again?
My storage area is used after erase.
@xiaoliang314, on some devices the flash erase blocks all execution of code. It is in general not possible to make the flash erase preemtible. For code to be executable during the erase all code should be executed from ram. If you really need the high priority task to be always done you will not be able to use any flash storage that is using internal flash.
@Laczen Usually, the time for a flash erase or write is not very long, about 2-8ms. The time consumed by settings_save_one
is generally a few hundred milliseconds or a few seconds. I have browsed the code of the settings subsystem, settings_save_one
spends most of the time on the lookup flash.
I think calling "settings_save_one" in the system work queue caused this issue because it cannot be preempted.
@nvlsianpu Sorry to disturb you, maybe you understand this issue better.
@xiaoliang314, yes during settings_save_one
all your data is read to avoid rewriting the same data twice. However your storage size is so small (and reads are so fast) that I concluded the reads could not be the problem.
@Laczen I halt the CPU, I see it is doing settings_fcb_compress
most of the time.
@Laczen I halt the CPU, I see it is doing
settings_fcb_compress
most of the time.
Ok, this means it is erasing sectors. You are going through the flash life fast, be carefull. Do you have the possibility to increase the number of sectors?
@xiaoliang314, could you share exactly what you are saving (size and name), how frequent, from what you reported you are doing fcb_compress each time you are doing a save. This is not normal.
@xiaoliang314, could you share exactly what you are saving (size and name), how frequent, from what you reported you are doing fcb_compress each time you are doing a save. This is not normal.
@Laczen name: "bt/mesh/RPL/%x", (%x is 0x0001 ~ 0x7FFF) size: 4 frequent: 4 ~ 8 second
@xiaoliang314, this is writing data with a new name every time, if this data is never deleted, storage is full and each time a call to settings_save is done a trial is made (unsuccesfully) to create space for the new data.
@Laczen This is part of bluetooth mesh. https://github.com/zephyrproject-rtos/zephyr/blob/b6217e1bec999c8133a21c34701d4bf0d3536099/subsys/bluetooth/mesh/rpl.c#L251-L271
Yes, but the entry->src is not expected to have so many values.
@xiaoliang314, can this issue be closed ?
@Laczen I think that time-consuming functions should not be called in the system queue, which will reduce the real-time performance of system, what do you think?
@Laczen Another solution is to add a set of APIs that modify thread preemptibility, set it to preemptible when calling settings_save
, and restore preemptibility after the end.
settings_save
is preemptible because it can be called by general threads, which are preemptible threads.
@xiaoliang314, this IMO not so important. The real problem is: a flash erase operation can take a long time during which the cpu can be halted. If this is the case it is impossible to guarantee real time operation. This can be avoided by not using internal flash storage.
The problem you had was that this happened on every write because the flash space was exhausted.
@Laczen The erasing and writing time of my flash is not very long, about 2-8ms. If the thread can preempt, theoretically the CPU can process a higher priority task every 8 milliseconds.
@Laczen The erasing and writing time of my flash is not very long, about 2-8ms. If the thread can preempt, theoretically the CPU can process a higher priority task every 8 milliseconds.
When and how settings_save_one
is called is completely up to the user. Please time the routine in a normal use case, it should be ms.
@Laczen The Bluetooth subsystem also uses settings_save_one in the system work queue, which is the cause of this problem. However, there seems to be no better solution. Can I keep it open until the problem is solved?
@xiaoliang314, no problem with keeping it open. I will remove my assignment, for me this is closed.
@Laczen The Bluetooth subsystem also uses settings_save_one in the system work queue, which is the cause of this problem. However, there seems to be no better solution. Can I keep it open until the problem is solved?
Please open a new bug for this with the template filled out.
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
settings_save_one
is a very time-consuming function, a call is usually a few hundred milliseconds to a few seconds, I think it should be preemptible. However, the function is usually called in the system work queue. Because the system work queue is not preemptive, this will cause the entire system to stagnate for a long time.