maybe-finance / maybe

The OS for your personal finances
https://maybe.co
GNU Affero General Public License v3.0
29.38k stars 2.23k forks source link

Add server side validation for required CSV import headers on load #999

Closed orelvis15 closed 1 month ago

orelvis15 commented 1 month ago

When you try to import a csv either from a file or by pasting the text it always ignores the first row and does not import it as a transaction. This means that when you import many transactions from several files you always lose 1 transaction at a time. If you paste or upload a file with a single row, it ignores it and does not show it.

To Reproduce Go to transactions, click import, select CSV and upload a file with, for example, 10 transactions and it only detects 9.

tonyvince commented 1 month ago

I was unable to reproduce this. Could you maybe give a sample of the CSV file that you have problem with

tonyvince commented 1 month ago

~I was able to reproduce. Will fix~

tonyvince commented 1 month ago

@orelvis15 sorry I thought I reproduced it but I was wrong. Can you provide a sample CSV file that is not working?

Here is a CSV file I tested and the imported transactions Screenshot 2024-07-19 at 12 51 14.

orelvis15 commented 1 month ago

This is the format I'm using

2023-06-29,-120.0,*,,PURCHASE 1
2023-06-28,-9.82,*,,PURCHASE 2
2023-06-27,-42.34,*,,PURCHASE 3

When I try to import by pasting, it only recognizes the last two, never the first image

tonyvince commented 1 month ago

At the moment, the importer expects the CSV file to have headers. The same data with headers works fine for me. For example,

date_header,amount_header,category_header,tags_header,name_header
2023-06-29,-120.0,*,,PURCHASE 1
2023-06-28,-9.82,*,,PURCHASE 2
2023-06-27,-42.34,*,,PURCHASE 3
Screenshot 2024-07-19 at 13 45 51
zachgoll commented 1 month ago

@tonyvince thanks for digging into this!

@orelvis15 I think to keep things simple, we should continue to enforce mandatory header rows. That said, this could definitely be prevented through a simple validation to check for the required headers when loading the CSV, so I'm going to rename the issue title to reflect this.

One potential solution would be to update this validation:

https://github.com/maybe-finance/maybe/blob/fa08f027c7d80cf7f557613d3015ed9e6c6ad28e/app/models/import.rb#L177-L183

To be something closer to:

def raw_csv_must_be_parsable_with_required_headers
  begin
    csv = CSV.parse(raw_csv_str || "")

    # check for headers
  rescue CSV::MalformedCSVError
    errors.add(:raw_csv_str, "is not a valid CSV format")
  end
end
orelvis15 commented 1 month ago

@zachgoll There is a face in the import where you select which part of the csv is going to be the title, total, category, etc., in that section you could inject the headers to the csv that is being imported in case you do not have them or change the What do you have if they are not correct?

When you download the csv from the banks, in my experience they have all come without headers, so we would be adding an extra step for the user in having to add the specific header so that it is imported correctly.

zachgoll commented 1 month ago

@orelvis15 I'd be fine with that strategy, although I think that logic should happen just prior to saving the raw_csv_str. If we went down that path, we should probably keep the logic simple to start:

if no_headers_present?
  # generate a generic column name for each column in CSV (i.e. `col1`, `col2`, `col3`, etc.) and _prepend_ to raw_csv_str
end
zachgoll commented 1 month ago

Given our conclusion in #1003 I'm going to close this out. The import is still 100% functional when headers are provided.

We'll prioritize more flexible imports in the future, but given the huge backlog of critical features that we still need to build, I think it makes sense to hold off on these optimizations for now.