Closed Erlkoenig90 closed 5 years ago
I'll check also this contribution (you are on fire man!) but my approach to #12 was actually in the README (I was about to close it)
Anyway, after a quick look at your PR I believe one of us have misinterpreted the request... Looking at the original message, I'm not sure any of us have nailed the point π
Ah now I see. Well, there are three cases:
You put objects directly into the buffer, like CircularBuffer<Record, 100>
. This does not work properly yet, because the 100 objects are constructed right when the CircularBuffer
instance is created, and only deleted when the buffer is deleted. This is usually not what you want; my PR fixes that and only calls c'tors/d'tors when elements are added/removed. This way it behaves similar to std::vector
, which might allocate a large buffer at its initial creation, but only call c'tors/d'tors when adding/removing/copying elements.
You put pointers into the buffer, like CircularBuffer<Record*, 100>
. This case works ok with the current implementation, as pointers don't have c'tors/d'tors; the user has to make sure to delete
the pointed-to objects correctly, as done in the Object example.
The user wants to put pointers into the buffer, but also to avoid the problems with manual memory management. Therefore they put a smart pointer into the buffer, like CircularBuffer<std::unique_ptr<Record>, 100>
. This case does not work with the current implementation, since unique_ptr
's c'tor/d'tor are not called properly (see 1.). If you apply my PR however, it will work as expected: As soon as you remove an element (unshift
, pop
) it will be automatically deleted and removed. Since the PR implements move semantics, you can also remove objects without deleting them, if desired, by doing something like:
std::unique_ptr<Record> ptr = buffer.pop ();
The same thing applies to other smart pointers. Unfortunately, AVR-GCC doesn't have unique_ptr
, but it's easy enough to implement yourself.
Therefore, without the PR, the user only really has choice 2 for putting class objects into the buffer, but manual memory management is discouraged anyways. So I'd propose to merge this PR as it allows the user to do 1. and 3. as well, without affecting behaviour or performance for other cases.
I've pulled your diff and tried against a sample sketch and I start scratching my head...
Either your sketch ends up doubling up the memory used by the buffer itself or it manages to move the data storage from .bss
to .text
, which is puzzling me.
I'm using the following test sketch:
#include "Arduino.h"
#include "CircularBuffer.h"
#define SAMPLE_PIN A0
void setup() {
pinMode(SAMPLE_PIN, INPUT);
}
CircularBuffer<byte, 500> buffer;
void loop() {
//Record* record = new Record(analogRead(SAMPLE_PIN), millis());
unsigned int record = analogRead(SAMPLE_PIN);
buffer.push(record);
delay(50);
if (buffer.isFull()) {
while (!buffer.isEmpty()) {
buffer.pop();
}
}
}
Built with the latest code in this repo for AVR328 I get:
section size addr
.data 18 8388864
.text 1856 0
.bss 1172 8388882
.comment 17 0
When using your latest contribution:
section size addr
.data 1024 8388864
.text 1828 0
.bss 166 8389888
.comment 17 0
To my admittedly limited understanding we are going to use space in heap (allocating the objects) and space in flash (managing the objects)...
Oh, this appears to be a contrived situation about initialization and constexpr... As you can see, ".text" usage (i.e.flash) in this specific case actually dropped. However, in the general case flash usage will be higher, which is bad... I removed the constexpr
stuff, which will slightly increase flash usage for this case, but will be better in the general case (specifically, when there are more than this one global object with constructors). I will write a more detailed explanation later. RAM usage was always identical, as .data and .bss are both in RAM and the sum did not change, they are just initialized differently. This is not specific to the constructor/destructor stuff, it is actually kind of an over-optimization.
PS: The ideal solution would be to have an extra variable for the buffer that is initialized separately; this would mean additional complexity for the user, which we should avoid since this is for Arduino, so sacrificing a few bytes should be ok.
I'm starting to evaluating having a separate version supporting this approach, especially after digging a bit on memory use:
βtextβ is code, vector table plus constants.
βdataβ is for initialized variables, and it counts for RAM and FLASH. The linker allocates the data in FLASH which then is copied from ROM to RAM in the startup code.
βbssβ is for the uninitialized data in RAM which is initialized with zero in the startup code.
I know it's a nice feature, but definitely I'm not willing to have such a big memory footprint on those tiny micros....
In one of my sketches where I was using a buffer of 1k pointers, switching to your latest patch makes the program not fitting on the mcu anymore.
As a Java programmer I know pointers can be a pain, but on micros you must learn to handle them...
I know how to handle pointers :wink: ... Well, there should not be any overhead. In theory, with the latest modifications to the PR, the memory/runtime performance of applications that don't put class objects directly into the buffer (i.e. no CircularBuffer<Record, 42>
) should be exactly the same as the original code; in fact, the generated binaries should be identical. If they aren't, there's still a bug; if you can provide an example, I can analyze that. The binaries are identical with your example from last night.
This is what C++ and template metaprogramming is about: You don't pay for what you don't use. The compiler automatically optimizes the CircularBuffer<T,...>
to be as efficient as possible with a given T
, so you don't need different versions of CircularBuffer
.
I know how to handle pointers :wink:
Absolutely no doubt about that, I was referring to the library users :wink:
Ah right :wink: Well if you have a tiny class, like class Point { byte x, y; };
it makes little sense to store them via pointers, i.e. CircularBuffer<Point*, 7>
, since CircularBuffer<Point,7>
will actually be more efficient, but that will only work with my PR...
switching to your latest patch makes the program not fitting on the mcu anymore.
Even with the changes from today morning? Can you show me your source code?
Haven't had the chance to pull your last changes, I was referring to what we had yesterday night. I'll probably have time to verify in the next couple of days, not sure I'll be able to do it tonight: tough times at work
On Tue, Dec 11, 2018 at 11:10 AM Niklas GΓΌrtler notifications@github.com wrote:
Ah right π Well if you have a tiny class, like class Point { byte x, y; }; it makes little sense to store them via pointers, i.e. CircularBuffer<Point*, 7>, since CircularBuffer<Point,7> will actually be more efficient, but that will only work with my PR...
switching to your latest patch makes the program not fitting on the mcu anymore.
Even with the changes from today morning? Can you show me your source code?
β You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rlogiacco/CircularBuffer/pull/16#issuecomment-446145825, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMLMRnlRPlC7cccl7t83CVd8vz0_TmHks5u34SigaJpZM4ZKR08 .
Okay! Yesterday's version might be (slightly) more efficient in some corner cases, but not in general. This had actually nothing to do with the Constructor-Destructor thing, but was because I overzealously added the constexpr
stuff even before that, which was however incomplete and did not kick in; the constructor-stuff only triggered that because I (accidentally) made the constexpr
stuff work as originally intended. That however isn't good with large arrays, as your examples show.
I haven't had the time to test the changes, but I've been studying them and I'm having a hard time to understand what's going on... I feel like I'm back at school π I apologize in advance for the time it will take to merge it, but I need to feel comfortable with the approach otherwise I'll not be able to maintain the library: hope you understand. I know it's my fault...
I feel so dumb... how should the library be used to store objects? The following code outputs plenty of compiler errors which I'm not even able to understand... π’
#include "CircularBuffer.h"
#include "Record.h"
#define SAMPLE_PIN A0
void setup() {
pinMode(SAMPLE_PIN, INPUT);
}
CircularBuffer<Record, 500, byte> buffer;
void loop() {
Record* record = new Record(analogRead(SAMPLE_PIN), millis());
buffer.push(record);
delay(50);
if (buffer.isFull()) {
while (!buffer.isEmpty()) {
buffer.pop();
}
}
}
Errors look like
CircularBuffer.tpp:21:37: error: use of deleted function 'CircularBuffer<Record, 500u, unsigned char>::Container::Container()'
CircularBuffer.h:134:8: error: no matching function for call to 'Record::Record()'
CircularBuffer.tpp:21:37: error: use of deleted function 'CircularBuffer<Record, 500u, unsigned char>::Container::~Container()'
I haven't had the time to test the changes, but I've been studying them and I'm having a hard time to understand what's going on... I feel like I'm back at school disappointed
Well, that's the cool thing about C++, you can do so many crazy things that few people know π€ͺ
apologize in advance for the time it will take to merge it, but I need to feel comfortable with the approach otherwise I'll not be able to maintain the library: hope you understand.
Hehe that's alright, there's no hurry. It's good for finding bugs. If you want to learn more about C++, there's a list of good books.
The following code outputs plenty of compiler errors which I'm not even able to understand... cry
Oh no, I should not have removed the union
's constructors and destructors... I added them back. With this code, it should now work:
#include <CircularBuffer.h>
#include "Record.h"
#define SAMPLE_PIN A0
void setup() {
pinMode(SAMPLE_PIN, INPUT);
}
CircularBuffer<Record, 100, byte> buffer;
void loop() {
buffer.push(Record{analogRead(SAMPLE_PIN), millis()});
delay(50);
if (buffer.isFull()) {
while (!buffer.isEmpty()) {
buffer.pop();
}
}
}
You don't need new
, as the objects are stored right inside the CircularBuffer::buffer
array, and not on heap. The Record{analogRead(SAMPLE_PIN), millis()}
creates a new temporary object (on stack) and passes it to push
, which then creates a new object inside the buffer
array (using placement-new syntax) and passes the temporary object to its copy-constructor, which was automatically defined by the compiler. This whole procedure is optimized by the compiler to not actually store a temporary object on stack but put the integers in the buffer directly. It works very similar to storing simple integers inside the buffer - if you type buffer.push(42)
, a temporary integer with 42 is allocated, passed to push
, which then copies it into the CircularBuffer::buffer
variable.
500 Record
objects don't fit in the RAM, because that requires (500 x (sizeof(unsigned int)+sizeof(unsigned long))) = 3000 bytes.
I got 5 minutes this morning to run the tests and here is what I collected regarding memory with the latest updates:
section size addr
.data 0 8388864
.text 990 0
.bss 1014 8388864
.comment 17 0
section size addr
.data 0 8388864
.text 990 0
.bss 2014 8388864
.comment 17 0
section size addr
.data 6 8388864
.text 1646 0
.bss 1018 8388870 // no hint the buffer will not fit
.comment 17 0
section size addr
.data 6 8388864
.text 1646 0
.bss 2018 8388870 // looking at this figure, the buffer should fit in memory
.comment 17 0
section size addr
.data 0 8388864
.text 1036 0
.bss 3014 8388864 // clearly greater than the 2048 available on UNO
.comment 17 0
section size addr
.data 0 8388864
.text 1036 0
.bss 6014 8388864 // getting close to the capacity of a MEGA
.comment 17 0
I do like these new changes as it makes clearer to the user the memory footprint: using pointers there's no hint the records will not fit on an Arduino UNO heap, with smart pointers it gets much clearer.
On the other hand, the notation for creating the new objects is very new to me and probably to most of the average Arduino users.
I want to study all this and get comfortable before pushing a new release: are you willing to contribute a wiki page with some guidance and references to help the average user? I will be happy to welcome you on board for this project π
When I get something new I can't stop scratching my head until I get at least a grasp, so here is my understanding so far (excuse my lack of precise terminology, I'm not that much into C++):
Container
union acts as a wrapper for the buffer nodes, it's not clear to me why you used a union
, but I'll come back to it in a secondObj
, specified on the push()
and unshift()
methods gets somehow resolved to Container of T, the wrapper (to my understanding the Obj
parameter is left unresolved so I guess the compiler is doing something smart here, probably related to the use of union
above mentioned)Now, I've plenty of questions, which I'll first try to answer by myself, but one might be critical, so here it goes: does the use of union
prevent doubling up the heap use?
In particular, when adding an item, we are instantiating a new wrapper around the item and then pushing the wrapper into the array, so we have, in theory a pointer (to the wrapper) containing a pointer (to the item). Does the union replace the last pointer with the actual object representation?
In practice, rather than storing pointers to objects, we are storing the internal representation of those objects: if an item has 6 integers attributes, each of our array slots will have the size of 12 bytes.
How far did I land, terminology aside?
I do like these new changes as it makes clearer to the user the memory footprint: using pointers there's no hint the records will not fit on an Arduino UNO heap, with smart pointers it gets much clearer.
That's right! There are no smart pointers involved here though, it's just static allocation (like a global array).
On the other hand, the notation for creating the new objects is very new to me and probably to most of the average Arduino users.
Hehe, it's the new (C++11) initialization syntax. You could also do
Record r (analogRead(SAMPLE_PIN), millis());
buffer.push(r);
which is almost the same, except that the record is not temporary anymore.
I want to study all this and get comfortable before pushing a new release: are you willing to contribute a wiki page with some guidance and references to help the average user
Sure :smiley: Maybe I can find the time for a longer text on the weekend.
* the `Container` union acts as a wrapper for the buffer nodes, it's not clear to me why you used a `union`, but I'll come back to it in a second
The union
prevents the compiler from calling the object constructor. If the array was simply T buffer [S];
, then the constructors of all S
objects will be called at program start, but we want to call them later when we add elements. Basically, the union
is used to turn the T
into storage space for T
. This could alternatively be achieved by something like alignas (T) char buffer [sizeof(T)*S];
.
* the _additional_ template parameter `Obj`, specified on the `push()` and `unshift()` methods gets somehow resolved to _Container of T_,
It gets resolved to whatever you pass to push
; if you do push(Record{...})
, it gets resolved to Record
; if you do push(42)
, it gets resolved to int
. It basically allows you to define functions (push
and unshift
) that accept any type of parameter. This parameter is then passed to the constructor of the T
object inside the buffer. In the Record
example, the copy constructor Record::Record (const Record&)
is called. The constructor is called on the object already in the buffer; we use the placement-new syntax to use existing space for a "new" object (which however does not use new space!). This works very similar to std::vector
. Bjarne Stroustrup's "The C++ Programming Language, 4th ed." contains a good description of that.
does the use of
union
prevent doubling up the heap use?
No, it has nothing to do with heap space, it just prevents the constructor call. A union
with a single member is very similar to a struct
with a single member, it just doesn't call the constructor.
In particular, when adding an item, we are instantiating a new wrapper around the item and then pushing the wrapper into the array
The array already contains the wrapper with the object inside. We just call its copy constructor to initialize it.
Do we have, in theory a pointer (to the wrapper) containing a pointer (to the item
No, there are actually no pointers involved at all, although you could make a pointer to a wrapper or the object inside.
In practice, rather than storing pointers to objects, we are storing the internal representation of those objects: if an item has 6 integers attributes, each of our array slots will have the size of 12 bytes.
That's right, just as with ordinary arrays (without union
).
The whole container stuff gets much more "interesting" if you try to make it exception-safe (like std::vector
)... But since exceptions are rarely (if ever) used on microcontrollers, we can get around that :wink:
Between your explanations and my testing I'm starting to grasp the principles: inside the buffer, objects are treated as if they were structures, and that became much clearer when I added struct testing.
Here is the code I'm using for testing and trying to comprehend.
When pushing/unshifting we are calling the destructor on the overwritten object: is that safe? I mean, I could have pushed something, then popped it out and still need its data after it has been replaced in the buffer... Unless the destructor is actually invoked on the wrapper, which it doesn't look the case
objects are treated as if they were structures
Well, C++ does not really know structures. In C++ struct
defines a class, just like class
does. The only difference is, that in a struct
, everything is public by default, unless you add private:
or protected:
.
Here is the code I'm using for testing and trying to comprehend.
Looks ok. In C++, typedef struct {... } Name;
is redundant, you can just do struct Name { ... };
.
I mean, I could have pushed something, then popped it out and still need its data after it has been replaced in the buffer...
Of course you shouldn't do that! You have to copy the data somewhere else during pop
. It's exactly the same thing with integers: If you have a CirculaBuffer<int,10>
and do:
int* p = &buffer.tail ();
buffer.pop ();
println (*p);
Anything evil can happen. If you never take pointers/references to things inside the buffer, that can't happen, though. We can't help that, though - it's the whole point of a ring buffer. Data is overwritten all the time. Copying out data correctly would look like this:
Record obj (buffer.pop ()); // Remove the object from the buffer (calls its destructor) and make a copy
println (obj.value);
Since there is no reference inside the buffer, this is ok.
Copying out data correctly would look like this:
Record obj (buffer.pop ()); // Remove the object from the buffer (calls its destructor) and make a copy println (obj.value);
That is a concern to me. I consider myself an average Arduino user, so I would expect to be able to write, as I did in my test sketch:
Record obj = buffer.pop();
Serial.println(obj.value);
Or even:
Record obj = buffer.first();
// some time after a few wipes
Serial.println(obj.value);
Well, I know, obviously, what's the whole point of a circular buffer, but I also expect the buffer not to mangle what I've pulled off of it under my back. I understand I can't mess with the buffer internals (like obtaining a pointer to the internal array) and expect nothing bad happens, but that should go the other way around as well.
Sadly, if I can make that mistake, than probably an average user will too π .
Actually, looking at my testing code, the simple assignment seems to work as expected and both the following seem to return the right values:
Record obj = buffer.pop();
Serial.println(obj.value);
Record obj = buffer.first();
// after the first element has been replaced and buffer.first().value != obj.value
Serial.println(obj.value);
My understanding is the assignment actually copies the popped object into stack memory, allowing the user to store it for as long as he wants, which is the behaviour I was willing to have.
Can you confirm, again beside incorrect terminology?
Can you confirm, again beside incorrect terminology?
Yes, it's completely right. The PR doesn't actually influence this behaviour; without the PR, copies would be made just the same way, and overwriting data in the buffer has the same effect of invalidating pointers/references.
Remember that objects work differently in C++ than in Java - Object instances are not stored neccessarily on the heap. If you pass variables of type "Record" around, then the instances are copied, but not references. To reference an existing object, you need & or *.
Ok, so far so good, thank you for lecturing me (sincerely and in a positive way).
Can you help me with the reason for (Obj&& value)
for push and unshift? Why can't we stick with (T&& value)
or what is the additional benefit?
I'm asking because it has a minor side effect on code completion showing up a template parameter unknown to the user...
I've been reading about this rvalue reference introduced with C++11, but as you already know, I'm lacking a ton of background knowledge to be able to get it quickly.
I've found this article which contains a clear and concise description of that rvalue reference
what is the additional benefit?
It allows you to pass references (T&), constant references (const T&) and rvalue references/rvalues (T&&), i.e. temporary objects. Without that, we couldn't support both variants of
Record r (1, 2); buffer.unshift (r);
buffer.unshift (Record {1, 2});
I'm asking because it has a minor side effect on code completion showing up a template parameter unknown to the user...
Yeah unfortunately template metaprogramming isn't good with code completion.
rvalue references are a complex topic so I can't really explain them here ;-)
Actually, we could do 3 variants of unshift
/push
, like:
unshift (const T& obj)
which calls the ordinary copy constructor Record::Record(const Record&)
(allows line 1 of above example)unshift (T&& obj)
which calls the move constructor Record::Record (Record&&)
(allows line 2 of above example)template <typename... Args> unshift_emplace (Args&&... args)
which calls any constructor Record::Record(Args&&...)
. This would allow you to do:
buffer.unshift_emplace (1, 2);
which would create a Record
instance right inside the buffer and pass it 1 and 2. This would completely eliminate any temporary objects and wouldn't require Record
to be copyable or moveable.This would work similar to std::vector::push_back
and std::vector::emplace_back
. What do you think?
I think we should keep the library small and focused on microcontrollers: there are much more complete and extensively supported libraries out there already providing all the bells and whistles.
In other words, point 3 is unnecessary IMHO.
Reading the article mentioned before I already got the rvalue reference as a way to obtain the move constructor, which, to my understanding, could also be obtained via overloading unshift(T& obj)
and unshift(const T& x)
(viable in this specific case as we have only one parameter).
My question was more like "do we really need an additional template parameter Obj
for the rvalue reference? Can't we just use the already existing parameter T
and thus replace unshift (Obj&& obj)
with unshift (T&& obj)
?".
That will resolve my concern regarding this Obj
template parameter the user gets from code completion suggestions but he is totally unaware of.
I think we should keep the library small and focused on microcontrollers: there are much more complete and extensively supported libraries out there already providing all the bells and whistles.
Even for microcontrollers? The "emplace" stuff is especially advantageous for microcontrollers, as it might avoid unnecessary copies.
My question was more like "do we really need an additional template parameter
Obj
for the rvalue reference? Can't we just use the already existing parameterT
and thus replaceunshift (Obj&& obj)
withunshift (T&& obj)
?".
We'd have to provide 2 overloads: for const T&
and for T&&
. The latter is distinct from Obj&&
, they have the same syntax but different meaning...
The latter is distinct from Obj&&, they have the same syntax but different meaning...
Which is what I'm obviously missing :smile:
What will not be supported if I just replace
template<typename T, size_t S, typename IT>
template <typename Obj>
bool CircularBuffer<T,S,IT>::unshift(Obj&& value) {
if (head == buffer) {
head = buffer + capacity;
}
*--head = value;
--head;
if (count == capacity) head->obj.~T ();
new (&head->obj) T (static_cast<Obj&&> (value));
if (count == capacity) {
if (tail-- == buffer) {
tail = buffer + capacity - 1;
}
return false;
} else {
if (count++ == 0) {
tail = head;
}
return true;
}
}
with
template<typename T, size_t S, typename IT>
bool CircularBuffer<T,S,IT>::unshift(T&& value) {
if (head == buffer) {
head = buffer + capacity;
}
*--head = value;
--head;
if (count == capacity) head->obj.~T ();
new (&head->obj) T (static_cast<T&&> (value)); // is cast redundant here?
if (count == capacity) {
if (tail-- == buffer) {
tail = buffer + capacity - 1;
}
return false;
} else {
if (count++ == 0) {
tail = head;
}
return true;
}
}
What will not be supported
This:
Record r {1, 2};
buffer.unshift (r);
// is cast redundant here?
Unfortunately no, as rvalue references behave like ordinary lvalue references, so we have to make an rvalue out of it again (that's "slightly" confusing).
I'm continuing to study the topic, just to discover how little I know about C++ in general :cry: and I'm wondering why aren't we just using a const reference (we don't want to modify the data) which works (to my understanding) on both lvalues and rvalues:
template<typename T, size_t S, typename IT>
bool CircularBuffer<T,S,IT>::unshift(const T& value) {
...
}
Unless I'm again missing something, which is highly probable...
It does work on rvalues, but you cant move from a const T&
. Therefore, if T
is moveable, we wouldn't take advantage of that.
I'm studying the moveable concept right now π
I have to apologize for using your knowledge to build mine, but I guess this isn't upsetting you too much as you keep answering my questions and doubts politely rather than just with a "go get a good book or course" π
Hehe, well there's no better way to learn than by trying it out, I think...
I now changed unshift
and push
as proposed - there are now two overloads const T&
and T&&
allowing you to make copies of existing objects, and to move in temporary objects. I also added _emplace
variants that directly create the object inside the buffer without ever copying anything, which is good for unmovable & uncopyable objects. This makes CircularBuffer
conform to conventions better (e.g. standard library containers) and more flexible.
I also added const
versions for some functions and made others totally const
. This is good for const
correctness, and allows you to e.g. pass a const CircularBuffer<...>&
to a function, and use it read-only there.
I changed head
& tail
to integers (indices), which is IMO cleaner and allows for more easy copying of the whole object. Therefore, I added copy & move constructors & assignment operators, solving issue #17. Only elements that were actually filled in are copied - i.e. if you make an empty CircularBuffer
and pass it by-value to a function, nothing is ever copied.
Btw, this PR also answers #10.
This PR is getting too big, I'm still trying to understand the initial commits π
For the emplace stuff I believe it will mostly confuse users: it might help reduce boilerplate, but it's not a life changer... better to keep it separate for a while and see...
I suggest we roll this last changes back until we sort out this PR and move forward from there :wink:
This PR lets the buffer automatically call constructors/destructors of elements properly, fixing #12 (if I understood it correctly). This allows users to place class objects in the buffer which do something in these functions, e.g. allocate memory in the c'tor and free it in the d'tor; with this PR, memory is freed correctly when the element is removed. This is achieved by placing the elements inside a
union
, which does not call constructors/destructors automatically. There is however an AVR-specific problem: For calling the constructor explicitly, the placement-new syntax has to be used. For this to work, an overload foroperator new
is required, which is provided by the<new>
header. Since AVR-GCC does not ship with standard c++ headers, we have to provide our own overload. It is just one line, but that might collide with other libraries which do the same. Therefore, I added a comment to have the user comment out the line in case of a problem. This is only done for AVR, as the ARM-GCC does provide<new>
. The parameter type forunshift
andpush
is now an "universal reference" to allow passing arbitrary objects to the element constructor. The additions should not incur any overhead for types with trivial c'tors/d'tors such as integers. The code compiles, but I did not test it on actual hardware.