pbogre / jetlog

Personal flight tracker and viewer
https://github.com/pbogre/jetlog
GNU General Public License v2.0
177 stars 7 forks source link

Custom CSV import error #34

Closed jeffrey1681 closed 1 month ago

jeffrey1681 commented 1 month ago

I just tried to import data using a custom csv and got the following 2 errors:

Unidentifiable column name 'arrival_date', skipping column...
Unidentifiable column name 'flight_number', skipping column...

Both of those column names look correct, is there something I'm doing wrong? The other 6 columns I'm using seem to work fine. The full first row of my csv is:

date,origin,destination,departure_time,arrival_time,arrival_date,distance,flight_number

I'm running via docker-compose, using pbogre/jetlog:latest

pbogre commented 1 month ago

The :latest tag may not have the arrival_date and flight_number fields, as those were added recently. Could you try with the :experimental tag?

jeffrey1681 commented 1 month ago

:experimental got rid of the arrival_date error, but flight_number still throws an unidentifiable column name error

pbogre commented 1 month ago

That's odd, since arrival_date was implemented after flight_number.

I just tried with the latest :experimental image, and the following CSV was imported fine:

date,origin,destination,arrival_time,departure_time,notes,flight_number
2024-03-14,lime,eheh,11:20,10:00,hello this is a nice flight,FR3461
2025-03-19,eheh,lime,18:40,16:30,whatever man..,FR3460

Would you be comfortable with sharing at least the first few rows of your CSV, so I can check if it works for me?

jeffrey1681 commented 1 month ago
date,origin,destination,departure_time,arrival_time,arrival_date,distance,flight_number
2024-06-28,KDTW,KATL,07:05:00,09:05:00,2024-06-28,957.6,DL 1611
2024-06-28,KATL,MMMD,11:20:00,12:32:00,2024-06-28,1501.5,DL 7962
pbogre commented 1 month ago

Using the latest image, I get the correct behavior. Note that they will not import successfully because the *_time columns are in the wrong format (HH:MM:SS, should be HH:MM).

Here are the logs for me, with your csv:

jetlog-1  | Parsing CSV into flights...
jetlog-1  | Detected columns: {'date': 0, 'origin': 1, 'destination': 2, 'departure_time': 3, 'arrival_time': 4, 'arrival_date': 5, 'distance': 6, 'flight_number': 7}
jetlog-1  | [1] Failed to parse: '2 validation errors for FlightModel
jetlog-1  | departure_time
jetlog-1  |   Value error, must be in HH:MM format, got '07:05:00' [type=value_error, input_value='07:05:00', input_type=str]
jetlog-1  |     For further information visit https://errors.pydantic.dev/2.3/v/value_error
jetlog-1  | arrival_time
jetlog-1  |   Value error, must be in HH:MM format, got '09:05:00' [type=value_error, input_value='09:05:00', input_type=str]
jetlog-1  |     For further information visit https://errors.pydantic.dev/2.3/v/value_error'
jetlog-1  | [2] Failed to parse: '2 validation errors for FlightModel
jetlog-1  | departure_time
jetlog-1  |   Value error, must be in HH:MM format, got '11:20:00' [type=value_error, input_value='11:20:00', input_type=str]
jetlog-1  |     For further information visit https://errors.pydantic.dev/2.3/v/value_error
jetlog-1  | arrival_time
jetlog-1  |   Value error, must be in HH:MM format, got '12:32:00' [type=value_error, input_value='12:32:00', input_type=str]
jetlog-1  |     For further information visit https://errors.pydantic.dev/2.3/v/value_error'
jetlog-1  | Parsing process complete with 2 failures
jetlog-1  | Importing 0 flights...
jetlog-1  | Importing process complete with 2 total failures

As you can see flight_number was properly identified.

The distance columns are also invalid, as they should be integers and not floats. This however was not properly detected by the importing process, so I'll fix that.

Also note that the arrival_date isn't really useful when it's the same as date, unless you prefer to show it anyway (it will be ignored for calculation of duration)

jeffrey1681 commented 1 month ago

I fixed my time & distance formats, pulled :experimental again and I'm still getting this error: Unidentifiable column name 'flight_number', skipping column...

date,origin,destination,departure_time,arrival_time,arrival_date,distance,flight_number
2024-06-28,KDTW,KATL,7:05,9:05,2024-06-28,957,DL 1611
2024-06-28,KATL,MMMD,11:20,12:32,2024-06-28,1501,DL 7962
jeffrey1681 commented 1 month ago

I think I may have figured out this bug. When I ran dos2unix on the file, the import now finds all of the columns. It still fails (because somehow the date format was invalid), but the windows line endings appears to be the issue with parsing the column names.

Edit: Confirmed, once I removed the windows line endings and fixed the data formats on the resulting file it imported correctly. Not sure if you want to add a check for line endings or just make sure to handle windows endings as well as unix ones.

pbogre commented 1 month ago

Ahh that makes sense, as currently I'm deleting the newline by running .replace('\n', '') which admittedly is not the best way to handle this...

I'll make a fix for this soon, thanks for finding this

pbogre commented 1 month ago

Let me know if the latest :experimental image fixes this problem

jeffrey1681 commented 1 month ago

I'm away from my machine for the weekend, I'll try next week.

sketchdigital commented 1 month ago

I'm having a similar issue, both with the :latest and :experimental images. If I attempt to import the CSV data shown below (just showing the first two lines here):

date,origin,destination,departure_time,arrival_time,arrival_date,duration,flight_number,notes
2017-04-27,KLAX,KMSY,6:55,12:15,2017-04-27,200,DL737,Some notes

The following error is logged:

Unidentifiable column name 'date', skipping column...
Detected columns: {'origin': 1, 'destination': 2, 'departure_time': 3, 'arrival_time': 4, 'arrival_date': 5, 'duration': 6, 'flight_number': 7, 'notes': 8}
[1] Failed to parse: 'Expected 8 entries, got 9'
[2] Failed to parse: 'Expected 8 entries, got 9'
[3] Failed to parse: 'Expected 8 entries, got 9'
Parsing process complete with 3 failures

If I put commas in front of each line of the CSV data (to create an empty column), date is now identified but a similar error is logged:

Unidentifiable column name '', skipping column...
Detected columns: {'date': 1, 'origin': 2, 'destination': 3, 'departure_time': 4, 'arrival_time': 5, 'arrival_date': 6, 'duration': 7, 'flight_number': 8, 'notes': 9}
[1] Failed to parse: 'Expected 9 entries, got 10'
[2] Failed to parse: 'Expected 9 entries, got 10'
[3] Failed to parse: 'Expected 9 entries, got 10'
Parsing process complete with 3 failures

I've tried adjusting the newline characters with no change in outcome.

pbogre commented 1 month ago

First of all, when importing a csv with an invalid column the next rows should have len(columns) - failed_columns columns, which means ignoring the failed ones. Since that's a bug, i opened an issue for that (#39).

As for your actual problem, could you check for all unicode characters in the file through some website? What happens if you move the date column to another position? Could it have to do with some character at the very start of the line?

sketchdigital commented 1 month ago

I figured out the issue. The CSV file was saved from Excel as a CSV UTF-8 (Comma delimited) file. While trying countless other things, I noticed in VSCode (where I had the CSV file open) that it showed the file encoding as UTF-8 with BOM. I then explicitly saved the CSV file with UTF-8 encoding (no BOM) and the import worked as expected.

Even though it's not in the description, the CSV UTF-8 (Comma delimited) file type must have added that sequence of bytes at the start of the text stream and that was the cause of the import failure. Thanks for your help with this.

pbogre commented 1 month ago

That's nice to hear. Since this issue seems to no longer be a problem, i'll close it. If anyone is still experiencing this, feel free to reopen.