[Node - Google Sheets V2]: Appending rows loads unnecessary data

soerenuhrbach commented 2 hours ago

Bug Description

I'm using Google Sheets Node v2 to append rows to a single Google Sheet with more than 10.000 rows (and 35 columns) in that sheet.

When running "update", "append" or "appendOrUpdate" operation leads to high memory when modifying sheets with a bigger amount of data. This results in performance and in worst case in out of memory exceptions. The n8n application crashes in my case.

It seems that running all of these operations loads the whole spreadsheet before inserting. I understand, that it's necessary to load all rows for update operations.

To Reproduce

Create big set of data in google sheets with more than 10k rows and 35 columns.
Create a workflow to insert a single row in that google spreadsheet using "append" operation of Google Sheets v2 node
Run workflow

Expected behavior

My expectations for the "append" operations is ...

appending rows doesn't require to load any other rows expect the key row to identify the columns
the performance of the operation is not affected by the amount of data in the spreadsheet

Operating System

Docker

n8n Version

1.63.1

Node.js Version

20.10.0

Database

PostgreSQL

Execution mode

main (default)

Joffcom commented 2 hours ago

Hey @soerenuhrbach,

We have created an internal ticket to look into this which we will be tracking as "GHC-310"

soerenuhrbach commented 2 hours ago

I'll submit a pull request to fix that performance issues in a few days.

The code (here) shows pretty clear that the getData-method is called without any restrictions on the rows, since only the sheet name (here) will be used as range.

In case of appending a row, we can restrict the range to the header row.

Joffcom commented 2 hours ago

Hey @soerenuhrbach

Does the operation have an option to make less api calls under the additional settings? This could help with the amount of data being loaded as we then change how we add the data.

If you are working with 10,000 rows and 35 columns I would recommend splitting the data into smaller chunks and using a sub workflow where possible which will help with the memory issues.

If you do decide to look into it and put in a pull request don't forget to update and include any new tests.

soerenuhrbach commented 2 hours ago

@Joffcom I tried this option as well, but that didn't have any effect. I've already debugged the code.

The operations always load all the data of a spreadsheet.

And btw - I don't want all the data to be loaded. The issue is that I only want to insert a single rows, which doesn't work pretty well, because the node always loads the whole spreadsheet data - even if I don't need it.

soerenuhrbach commented 1 hour ago

The debugger shows pretty clear that the whole dataset (12.466 rows) is been loaded, even tough my workflows has only a single input item.

Debugger:

Workflow:

n8n-io / n8n