Closed TomAugspurger closed 2 years ago
@MarcoGorelli I'm not sure what you think is constructive, but I've mentioned repeatedly that I have a scalable wrapper that preserves df.append semantics, while preventing everyone having to independently write the same code. Am I not being constructive?
Hey @wumpus ,
My "please be constructive" was in response to the comment "So not cool :/" and wasn't directed at you
Thanks for your input and for a link to your wrapper
@wumpus that was in reference to another conversation (not you)
your wrapper is not incorporated to pandas / likely won't be in any event
I agree that I appear to be wasting my time, despite having a solution to the root problem. What am I doing wrong?
@wumpus what u wrote might be fine for your use but it's not going to be possible to do this lazy type of evaluation in a reliable way in pandas itself
sure it could be done but would lead to a large amount of edge cases that would lead to a very brittle / complex soln
My code is fully lazy. I agree that there are probably edge cases -- easy to see because append() and concat() are wildy different.
I wish not to deprecate series / dataframe.append. There are scenarios in my code that I could not done using pd.concat. For Example, I created a list of records that has missing series, and it requires me to do groupby for me to be able to identify them. Then I created a list of those records and then iterate each of those and append the missing series using df.append. I cannot find a way on how to do this with pd.concat.
Usual response - please provide a minimal reproducible example
My two cents on concat vs append (since I use it quite extensively in my algotrading platform):
Append has been incredibly useful for me and I've used it in probably 12-15 places in my codebase. I use dataframes to load price data into memory for fast compute and at times need to append new rows (e.g. orders placed) to a dataframe. Given the size, I use Dataframes almost as a replacement for lists since its far more nimble.
Append until now - allowed me to quickly and easily add a dictionary to an existing dataframe. Concat now requires me to create a dataframe with 1 or more rows and then concat it with my existing dataframe vs. simply just adding a dictionary to the existing dataframe.
Concat seems to be a vertical merge of two dataframes (extend rows) vs. merge which horizontally merges two dataframes (i.e. extends columns basis common keys). If anything - concat intuitively does not suggest appending so here's what I propose:
As for an example: Previously I use to use this: self.df_balances = self.df_balances.append(trade_date_balance.to_dict(), ignore_index=True)
Now its replaced with (a little annoying): new_balances_row_df = pd.DataFrame(trade_date_balance.to_dict(), index=[0]) self.df_balances = pd.concat([self.df_balances, new_balances_row_df], ignore_index=True)
For context - df_balances is a dataframe I maintain to save daily balances during my backtesting engine runs which allows me to compute funds available for investing. As I loop through my backtesting dates, I keep inserting this into the dataframe at the end of the day so I can quickly access it later when needed. Eventually, I output the df into a csv so that I can manually verify there is no calculation or settlement error (from a funds perspective).
I do use .loc to make updates - however, it isn't intuitive because you need to know the index or the label - which honestly doesn't matter when you append - and from my knowledge - I don't think .loc supports adding a dictionary.
- Allow concat to add dictionaries to the dataframe (along with support for arrays). Also instead of doing pd.concat - why can't we simply do df.concat([df1, df2]) which adds data from df1 and df2 to df?
I think allowing concat to add dictionaries is a fair point, since it is mentioned multiple times in this topic. Not sure about df.concat([df1, d2])
, it's just as easy to use pd.concat([df, df1, df2])
.
@erfannariman - agreed - its not hard. But merge also uses the same lingo - df.merge(df1), since concat is just a merger of rows from two dfs (in some sense) - might as well stick to the same writing style as merge?
Not a big one - but was a comment for consistency.
I feel like code readability is so much better with append
that concat
.
I understand that append
is not in-place and that it is less efficient than concat
.
Even though: append
feels more pythonic to me that concat
does.
I often use it with single row dictionaries, Series or DataFrames and I feel that my code is more readable this way... Would it make sense to get new appends like:
Would it make sense to get new appends like:
-1 on adding even more methods to the API, and very confident that there'd be broad consensus on this among pandas devs
Examples of how to do these, though, would be good candidates for the docs Tom said he'd help write
I understand that append is not in-place and that it is less efficient than concat.
If you're just appending a single row, there shouldn't be much difference in efficiency. If you're appending multiple, then that's where append
encourages inefficient code, which is why it's been deprecated. Here's an example from the awesome library ArviZ where the append
deprecation "forced" them to write better code: https://github.com/arviz-devs/arviz/pull/1973/files
@MarcoGorelli so the question is: is the deprecation being reconsidered?
No, what makes you think that?
There can be docs to help the transition (which you'd be welcome to help out with, see the contributing guide if you're interested)
@MarcoGorelli clearly the community is saying this is bad. What will it take to stop this?
What will it take to stop this?
I'd suggest starting with a minimal reproducible example indicating why you think append
needs to stay
Does this meet the needs @MarcoGorelli of a simple sample?
append
is simple to understand everyone knows list().append
. pd.concat
is more like list.extend
. Though for pushing lots of data, extend is better on a list, for one row, append
is fine. The dev team is pushing everyone to go to the extend
like method on a list
.
import pandas as pd
data = [{'simple' : 'example 1'}, {'simple' : 'example 2'}, {'simple' : 'example 3'}]
pd.DataFrame(data).append({'simple' : "example 4"}, ignore_index=True)
Now let's append with concat:
df = pd.DataFrame(data)
pd.concat([df, {'simple' : "example 4"}])
Traceback (most recent call last):
Python Shell, prompt 18, line 1
# Used internally for debug sandbox under external interpreter
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-dev\Lib\site-packages\pandas\core\reshape\concat.py", line 295, in concat
sort=sort,
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-dev\Lib\site-packages\pandas\core\reshape\concat.py", line 370, in __init__
raise TypeError(msg)
builtins.TypeError: cannot concatenate object of type '<class 'dict'>'; only Series and DataFrame objs are valid
df = pd.DataFrame(data)
df.concat([{'simple' : "example 4"]) # method doesn't exist
df = pd.DataFrame(data)
df1 = pd.DataFrame(data=[{'simple' : 'example 4'}])
pd.concat([df, df1]) # no error finally
Output:
pd.concat([df, df1])
simple
0 example 1
1 example 2
2 example 3
0 example 4
A bit of a note on example 3, the pd.concat is a method within Pandas, not on the object whereas append is right on the DataFrame. We have overhead for 1 row creating a dataframe. This seems like overkill. Plus now I have to reset my index with concat.
So if I were to break it down:
append
exists on the dataframe and is common function used throughout the python ecosystempd.concat
method doesn't exist on the DataFrame. That means a user has to search for the function. pd.concat
causes users to have to manage the indexes themselves. append
will increase to the next iteration.You can do
>>> pd.concat([pd.DataFrame(data), pd.DataFrame({'simple': 'example 4'}, index=[len(data)])])
simple
0 example 1
1 example 2
2 example 3
3 example 4
which doesn't seem more complicated that using append
>>> pd.DataFrame(data).append({'simple' : "example 4"}, ignore_index=True)
<stdin>:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
simple
0 example 1
1 example 2
2 example 3
3 example 4
append is simple to understand everyone knows list().append
Yes, that's exactly the issue - to quote the original post: "They're making an analogy to list.append, but it's a poor analogy since the behavior isn't (and can't be) in place. The data for the index and values needs to be copied to create the result."
We have overhead for 1 row creating a dataframe. This seems like overkill. Plus now I have to reset my index with concat.`
ignore_index=True
for concat, no need to call reset_indexI wasn't looking for solutions for my example... I knew this is what would happen...
Look I know there are ways around this, but why not just make append
do what the concat
method in the background? Keep both, you your functionality and the community gets to keep a well known function name?
It seems like a solid ask and compromise.
but why not just make append do what the concat method in the background?
It already does:
The issue isn't for when you're appending a single row, but for when you're appending many (e.g. in a loop) - in that case, having append
encourages bad and inefficient code
In [2]: import pandas as pd
In [3]: df = pd.DataFrame(range(10_000))
...: dfs = [df] * 100
In [4]: %%timeit
...: df_result = dfs[0]
...: for df in dfs[1:]:
...: df_result = df_result.append(df)
...:
1.39 s ± 51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [5]: %%timeit
...: df_result = pd.concat(dfs)
...:
...:
3.6 ms ± 76.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Then just state the purpose the method in the doc
. You are shoehorning in all use cases into one method, when two methods are fine.
I haven't seen the impact on chaining style pandas code mentioned in the discussion above (maybe it's discussed elsewhere?), so here's what I'm wondering:
Deprecating pandas.DataFrame.append()
will remove a seemingly intuitive possibility to add a row (or rows) to a data frame while writing pandas code in a chained style:
fruits = pd.DataFrame(
{
"name": ["apple", "pear", "avocado"],
"image": ["🍏", "🍐", "🥑"]
}
)
veggies = pd.DataFrame(
{
"name": ["tomato", "carrot", "avocado"],
"image": ["🍅", "🥕", "🥑"]
}
)
both_fruit_and_vegetable = (
fruits
.append({"name": "tomato", "image": "🍅"}, ignore_index=True) # Forgot the tomato is a fruit, too!
.merge(veggies)
# ... Add other chained operations
)
print(both_fruit_and_vegetable)
# OUTPUT:
#
# name image
# 0 avocado 🥑
# 1 tomato 🍅
I'm not sure how often you'd want to add rows to a data frame like this, and I understand you could achieve the same using pandas.DataFrame.merge()
, e.g. in this minimal example:
both_fruit_and_vegetable = (
fruits
.merge(pd.DataFrame({"name": ["tomato"], "image": ["🍅"]}), how="outer")
.merge(veggies)
)
I'm showing the merge functionality as an example method also because pandas has the instance-level method pandas.DataFrame.merge()
as a wrapper for the lower-level pandas.merge()
.
I thought that this wrapper exists to make chained-style pandas possible for merge operations (and at least a few others think so too), but please correct me if I'm wrong.
.append()
for Chaining SyntaxSo I'm wondering whether there's a suggested alternative for adding a row to a data frame when writing chained style pandas code.
Is the solution to use pandas.DataFrame.merge()
with appropriate parameters, or will non-SQL-wizards run into unexpected join behavior that's harder to wrap your head around than a seemingly more straightforward append/concat style concatenation?
Or could it be useful to add an instance-level pandas.DataFrame.concat()
method that uses pandas.concat()
internally, but opens up the opportunity to chain the operation to other operations using a familiar syntax?
Thanks for your thoughts and work!
First of all, that's a great example, thanks!
Though can't concat fit into the chain?
In [7]: both_fruit_and_vegetable = (
...: pd.concat([fruits, pd.DataFrame({'name': ['tomato'], 'image': ["🍅"]})], ignore_index=True)
...: .merge(veggies)
...: # ... Add other chained operations
...: )
In [8]: both_fruit_and_vegetable
Out[8]:
name image
0 avocado 🥑
1 tomato 🍅
Lol, thanks 😋
Your example works in this specific case, where .append()
is the first thing I do. But it doesn't work when I'd want to concat somewhere lower down in the chain, e.g.:
both_fruit_and_vegetable = (
fruits
.merge(veggies)
.append(pd.DataFrame({'name': ['tomato'], 'image': ["🍅"]}), ignore_index=True)
)
I can't chain pd.concat()
onto a previous chain link, which is possible with df.append()
You can also use .pipe()
for method chaining with arbitrary functions.
Sure but you can still fit concat
into the chain:
both_fruit_and_vegetable = pd.concat(
[fruits.merge(veggies), pd.DataFrame({"name": ["tomato"], "image": ["🍅"]})],
ignore_index=True,
)
Or indeed, as suggested above:
fruits.merge(veggies).pipe(
lambda df: pd.concat(
[df, pd.DataFrame({"name": ["tomato"], "image": ["🍅"]})], ignore_index=True
)
)
If you just need to append a single row, then such workarounds should be fine. If you need to append many rows inside a loop, then not having append
will at least not encourage inefficient code
Sure but you can still fit
concat
into the chain:both_fruit_and_vegetable = pd.concat( [fruits.merge(veggies), pd.DataFrame({"name": ["tomato"], "image": ["🍅"]})], ignore_index=True, )
Or indeed, as suggested above:
fruits.merge(veggies).pipe( lambda df: pd.concat( [df, pd.DataFrame({"name": ["tomato"], "image": ["🍅"]})], ignore_index=True ) )
If you just need to append a single row, then such workarounds should be fine. If you need to append many rows inside a loop, then not having
append
will at least not encourage inefficient code
Why is the goalpost constantly being moved here? You requested examples and they have been provided as is demonstrated here. And the answer is to use a workaround why exactly? If append works as intended shouldn’t that be the goal? I have pretty much accepted that the powers to be are not going to listen to feedback as you have convinced yourselves that a problem that doesn’t need fixing or arguably doesn’t even exist needs to be addressed. The solution to bad code is not to remove a tool that has been misused. Simply trying to point out that your reasons are misguided is despite your good intentions.
pd.concat with a single row at a time is the performance problem.
And as a reminder, I have a demonstration of a high-performance append.
DataFrame.append
makes a poor analogy to list.append
, but it's a poor analogy and it encourages inefficient code.
pd.concat
on them and you'll get a noticeable performance gain, especially for large-ish DataFramesThe purpose of asking for minimal reproducible examples was to see if anyone had a use-case for which there wasn't a simple workaround.
You're all being listened to, I've read every post in this thread. The arguments for keeping append
seem to be:
None of these strike me as strong enough reasons to keep append
:
pipe
And as a reminder, I have a demonstration of a high-performance append.
You've already advertised your package here 3 times, please stop
I was hoping to successfully talk to "the powers that be" about this change. Looking at the repo owners I see that you are the person I wanted to talk to! Glad I was able to get my code in front of you for a review.
Phew, so many new messages to this topic.
First of all, for me there are two points: it is such a common function that it breaks A LOT of code. This is really bad even if the append pattern is a bad one. Does it really hurt so much to keep it? It costs developers a lot of time to remove all the append calls. I love backward compatibility and I think breaking it for no good reason other than "we want to force developers to do it differently" is a very bad idea.
In our code base, we finally managed to remove all append calls, usually replacing the whole function with better code. When we started with Pandas and didn't know how to work efficiently with Pandas, a common pattern was using "manual groupbys", i.e. loop over df.some_column.unique()
, apply the selection like df_group = df[df.some_column == value]
, do the calculation on the group, append to a result. Very bad indeed. My whole point is that this doesn't improve at all when only replacing the append
call with concat
. Rewriting these loops with an array to collect the DFs and calling concat
at the end gets around the deprecation but doesn't fix the whole style of the function. And sometimes it is even more difficult to fix the old code where an experienced pandas developer would only think "WTF". So fixing all these things is a lot of work for no good reason (the old WTF code was tested and working correctly).
@MarcoGorelli wrote:
Thanks @behrenhoff - maybe this is a case for concat preserving attrs then? Do you want to open a separate issue for that?
Actually, I am in favor of getting rid of as many attrs in our code base as possible. I don't like them at all, they were getting used all over the place so that testing became difficult (when every function expects 10 different attrs to exist, you are in hell and your function become less reusable). Therefore we got rid of a lot of attrs. And concat discurages attrs. But yeah, that was another bit of work. So my code base is now free of attrs and free of append. Work done.
Look, I understand getting rid of some old functions is sometimes a good idea but I really really don't like removing such a common function.
None of these strike me as strong enough reasons to keep append:
the workaround above are simple enough and also legible
Simple enough? That's only true if you don't fix the whole thing. If you just replace every append call with a concat call, you win absolutely nothing.
plenty of people do care about pandas performance
So? I don't understand this argument. df.groupby(col).apply
is slow as well and not removed. Also: does append affect other functions? Is concat for two dfs faster than append? No? Only if you do multiple appends? But then you need to modify your algorithm (for example collections separate DFs in a list). Are there really cases where append
is a problem? My point is: when you replace it with concat, it won't have an impact on the performance unless you change the whole logic. I DO care about performance in Pandas as well - but ONLY in the areas that affect me. Building/appending to a DF is not in the list at all. If you do care about the append aspect, use a better solution for that purpose. (a bit of whataboutism: a lot of groupby functions are slow as hell when there are many groups, that's where I care)
I thought about clicking the reply button since the deprecation is already in, so this post doesn't change anything - but I feel really strong about "keeping compatibility". I want to be able to update pandas without worrying too much.
By the way: how does concat improve this code:
total_df = pd.DataFrame()
for file in glob("*.csv"):
print(f"reading {file}")
df = pd.read_csv(file)
total_df = total_df.append(df).drop_duplicates()
Yes, it is easy to replace:
total_df = pd.DataFrame()
for file in glob("*.csv"):
print(f"reading {file}")
df = pd.read_csv(file)
total_df = pd.concat([total_df, df]).drop_duplicates()
But the performance gain is 0.
Note that this doesn't work (too much RAM usage) - so you cannot blindly rewrite all df.append to use a list and concat at the end:
dfs = []
for file in glob("*.csv"):
print(f"reading {file}")
df = pd.read_csv(file)
dfs.append(df)
total_df = pd.concat(dfs).drop_duplicates()
Note that append is orders of magnitude faster than read_csv in this example. No performance impact at all. Just work to remove the append
calls. (and yes, our real code uses a slightly smarter algorithm)
Having seen the examples in this thread, I would even argue that append
is a strong code smell in all cases. It's a question of priorities - compatibility vs. trying to enforce a better style. Especially as a new Pandas user you want to append to your toy DF. This should - in my optinion - be an easy task. The append is only a performance problem if you do it over and over again, not in the general case where you only append one DF to another. That's a very big difference.
So at the end a TLDR:
append
with a concat
doesn't help with performanceappend
and you do smart changes to your code (for example filling a list of DFs in a loop and calling concat
on the list at the end).append
in a loop is a very strong code smellThere's a standard database algorithm to speed up appending single rows at a time to a database, that's what pandas-appender uses. That relieves Pandas users from having to make smart changes.
In 2010 I had a 30 petabyte homegrown NoSQL database using this algorithm at my search engine startup.
@behrenhoff
By the way: how does concat improve this code:
total_df = pd.DataFrame() for file in glob("*.csv"): print(f"reading {file}") df = pd.read_csv(file) total_df = > total_df.append(df).drop_duplicates()
Yes, it is easy to replace:
total_df = pd.DataFrame() for file in glob("*.csv"): print(f"reading {file}") df = pd.read_csv(file) total_df = pd.concat([total_df,
this is exactly the reason append is super problematic we have an entire doc note that i guess no one reads that explain as you are doing an exponential copy here (no kidding u run out ram)
so you have proved the point why append is a terrible idea - it's not about readability but easy to fall into traps that are non obvious at first glance
If only there was a well-known algorithm which was not an exponential copy.
this is exactly the reason append is super problematic we have an entire doc note that i guess no one reads that explain as you are doing an exponential copy here (no kidding u run out ram)
You did not read or not understand what I was saying. The version with append is the one that WORKS, the one with concat at the end runs into memory issues (because there is the small drop_duplicates
in the loop that fixes the problem and cannot be moved out).
And yes, you can be smarter, for example ((file1 + file2).drop_dups + (file3 + file4).drop_dups).drop_dups
or similar - where +
can be concat or append - doesn't matter. I was just proving the point that the suggested way "collect all DFs in a list and concat them all at the end" does not always work.
Thanks @behrenhoff , that's a nice example - though can't you still batch the concats? Say, read 10 files at a time, concat them, drop duplicates, repeat...
This seems like a perfect summary of the issue anyway:
it's not about readability but easy to fall into traps that are non obvious at first glance
At some point we should lock the issue, this is taking a lot of attention away from a lot of people, there's been off-topic comments, no compelling use-case for keeping DataFrame.append
, and strong agreement among pandas devs (especially those who have been around the longest)
Say, read 10 files at a time, concat them, drop duplicates, repeat...
Yes, that would work. So would 1 million other solutions. In practice, I could even exploit more about the date ordering inside of the files (all files here have a rather long overlapping history, but newer files can overwrite (fix) data in older files, so it is of course a drop_dups with a subset and keep=last). My point is: this is a non-issue because the operation is done once per 6 month or so, the daily operation just adds exactly one file. No point in optimizing this further as long as it works. That is the whole point I was trying to make. You force people to optimize / change code where old code just works and where there is no need to modify it. And the real gains in this example are not in append vs concat but in exploiting knowledge of the input files and reading them in different order or in groups.
Note that I am not saying this is a usecase that can only be done with append
. I am saying it that removing a common feature is unnecessary work imposed on many people and that you don't get performance gains for free by only replacing append
with concat
(you need to do more).
Anyway, end of discussion for me. I already did the work and got rid of all my appends.
I just fear that many people will not upgrade if their code breaks. You are also making it harder for new users. append
is a good and common English word, concat
is not, at least I can't find it in a dictionary (there is concatenate
but it is a word that a lot fewer people know - this might not be a problem for native English speakers though). I would always search for "append", not for "concat" if I didn't knew the proper function name.
Hi, minimal reproducer that was totally broken:
Before:
a = pd.DataFrame({"A": 1, "B": 2}, index=[0])
b = pd.DataFrame({"A": 3}, index=[0])
for rowIndex, row in b.iterrows():
print(a.append(row))
# Output:
# A B
#0 1 2.0
#0 3 NaN
After:
a = pd.DataFrame({"A": 1, "B": 2}, index=[0])
b = pd.DataFrame({"A": 3}, index=[0])
for rowIndex, row in b.iterrows():
print(pd.concat([a, row]))
# Output:
# A B 0
#0 1.0 2.0 NaN
#A NaN NaN 3.0
Also, please, note that if you add deprecation warning in such popular method that is used widely and calls many times per second - this message will be spammed a lot leading to much bigger overhead than you have with allocations and memory copying. So it is beneficial to print such message only on first call.
What are you trying to do? It would be way more efficient to call
pd.concat([a, b], ignore_index=True)
Edit: Or was it on purpose to put A
into the Index instead as a column?
I know, this is just an illustration. I was iterating over rows and if row is OK - adding it to another table. I believe that there are much better way via masking and concatenation with taking such masks into account, but I wanted to have code as simple as possible.
Thanks for your response. It is important for us to see usecases that can not be done more efficiently in another way. You are right, checking data can be done way more efficiently via masking and the concatenating the result.
How can I concat such row
to another table a
(with superset of row's column names) in such case?
with
pd.concat([a, row.to_frame().T], ignore_index=True)
You can simply do:
a = pd.DataFrame({"A": 1, "B": 2}, index=[0])
b = pd.DataFrame({"A": [3, 4]})
result = pd.concat([a, b.loc[b["A"] > 3]], ignore_index=True)
Just change the greater 3 to a condition that suits your needs. This avoids the iterating over the rows step. If you have to iterate for some reason, you can use the example from @MarcoGorelli
Not all conditions and not every logic can be readable with such single-line expression.
For people who like me want to just get rid of warnings:
import pandas as pd
def pandas_append(df, row, ignore_index=False):
if isinstance(row, pd.DataFrame):
result = pd.concat([df, row], ignore_index=ignore_index)
elif isinstance(row, pd.core.series.Series):
result = pd.concat([df, row.to_frame().T], ignore_index=ignore_index)
elif isinstance(row, dict):
result = pd.concat([df, pd.DataFrame(row, index=[0], columns=df.columns)])
else:
raise RuntimeError("pandas_append: unsupported row type - {}".format(type(row)))
return result
Here is a use case for Data.Frame.append
, that I think makes sense and for which it took me way too long to figure out how to replace it with pandas.concat
. (Do note that I am not a seasoned pandas
user.)
I have a data frame with numeric values, such as
df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
and I append a single row with all the column sums
totals = df.sum()
totals.name = 'totals'
df_append = df.append(totals)
Simple enough.
Here are the values of df
, totals
, and df_append
>>> df
A B
0 1 2
1 3 4
>>> totals
A 4
B 6
Name: totals, dtype: int64
>>> df_append
A B
0 1 2
1 3 4
totals 4 6
Now, using pd.concat
naively:
df_concat_bad = pd.concat([df, totals])
which produces
>>> df_concat_bad
A B 0
0 1.0 2.0 NaN
1 3.0 4.0 NaN
A NaN NaN 4.0
B NaN NaN 6.0
Apparently, with df.append
the Series
object got interpreted as a row, but with pd.concat
it got interpreted as a column.
You cannot fix this with something like axis=1
, because that would add the totals as column.
Fortunately, in a comment above, the implementation of DataFrame.append
is quoted, and from this one can glean the solution:
df_concat_good = pd.concat([df, totals.to_frame().T])
which yields the desired
>>> df_concat_good
A B
0 1 2
1 3 4
totals 4 6
I think users need to be aware of such subtleties. I also posted this on StackOverflow.
This was brought up in https://github.com/pandas-dev/pandas/issues/35407#issuecomment-1092892819 , and some other comments in this thread, and would/should be part of the transition docs (see https://github.com/pandas-dev/pandas/issues/46825)
Worst idea I've seen, why complicate something so easy, I think it's better to have more options/ways to do something than just one strict way. Dataframe.append() was very easy for noobies to add data to a dataframe
I think that we should deprecate
Series.append
andDataFrame.append
. They're making an analogy tolist.append
, but it's a poor analogy since the behavior isn't (and can't be) in place. The data for the index and values needs to be copied to create the result.These are also apparently popular methods. DataFrame.append is around the 10th most visited page in our API docs.
Unless I'm mistaken, users are always better off building up a list of values and passing them to the constructor, or building up a list of NDFrames followed by a single
concat
.