Closed jeffrey1681 closed 1 month ago
The :latest
tag may not have the arrival_date
and flight_number
fields, as those were added recently. Could you try with the :experimental
tag?
:experimental got rid of the arrival_date
error, but flight_number
still throws an unidentifiable column name error
That's odd, since arrival_date
was implemented after flight_number
.
I just tried with the latest :experimental
image, and the following CSV was imported fine:
date,origin,destination,arrival_time,departure_time,notes,flight_number
2024-03-14,lime,eheh,11:20,10:00,hello this is a nice flight,FR3461
2025-03-19,eheh,lime,18:40,16:30,whatever man..,FR3460
Would you be comfortable with sharing at least the first few rows of your CSV, so I can check if it works for me?
date,origin,destination,departure_time,arrival_time,arrival_date,distance,flight_number
2024-06-28,KDTW,KATL,07:05:00,09:05:00,2024-06-28,957.6,DL 1611
2024-06-28,KATL,MMMD,11:20:00,12:32:00,2024-06-28,1501.5,DL 7962
Using the latest image, I get the correct behavior. Note that they will not import successfully because the *_time
columns are in the wrong format (HH:MM:SS
, should be HH:MM
).
Here are the logs for me, with your csv:
jetlog-1 | Parsing CSV into flights...
jetlog-1 | Detected columns: {'date': 0, 'origin': 1, 'destination': 2, 'departure_time': 3, 'arrival_time': 4, 'arrival_date': 5, 'distance': 6, 'flight_number': 7}
jetlog-1 | [1] Failed to parse: '2 validation errors for FlightModel
jetlog-1 | departure_time
jetlog-1 | Value error, must be in HH:MM format, got '07:05:00' [type=value_error, input_value='07:05:00', input_type=str]
jetlog-1 | For further information visit https://errors.pydantic.dev/2.3/v/value_error
jetlog-1 | arrival_time
jetlog-1 | Value error, must be in HH:MM format, got '09:05:00' [type=value_error, input_value='09:05:00', input_type=str]
jetlog-1 | For further information visit https://errors.pydantic.dev/2.3/v/value_error'
jetlog-1 | [2] Failed to parse: '2 validation errors for FlightModel
jetlog-1 | departure_time
jetlog-1 | Value error, must be in HH:MM format, got '11:20:00' [type=value_error, input_value='11:20:00', input_type=str]
jetlog-1 | For further information visit https://errors.pydantic.dev/2.3/v/value_error
jetlog-1 | arrival_time
jetlog-1 | Value error, must be in HH:MM format, got '12:32:00' [type=value_error, input_value='12:32:00', input_type=str]
jetlog-1 | For further information visit https://errors.pydantic.dev/2.3/v/value_error'
jetlog-1 | Parsing process complete with 2 failures
jetlog-1 | Importing 0 flights...
jetlog-1 | Importing process complete with 2 total failures
As you can see flight_number
was properly identified.
The distance
columns are also invalid, as they should be integers and not floats. This however was not properly detected by the importing process, so I'll fix that.
Also note that the arrival_date
isn't really useful when it's the same as date
, unless you prefer to show it anyway (it will be ignored for calculation of duration)
I fixed my time & distance formats, pulled :experimental again and I'm still getting this error:
Unidentifiable column name 'flight_number', skipping column...
date,origin,destination,departure_time,arrival_time,arrival_date,distance,flight_number
2024-06-28,KDTW,KATL,7:05,9:05,2024-06-28,957,DL 1611
2024-06-28,KATL,MMMD,11:20,12:32,2024-06-28,1501,DL 7962
I think I may have figured out this bug. When I ran dos2unix on the file, the import now finds all of the columns. It still fails (because somehow the date format was invalid), but the windows line endings appears to be the issue with parsing the column names.
Edit: Confirmed, once I removed the windows line endings and fixed the data formats on the resulting file it imported correctly. Not sure if you want to add a check for line endings or just make sure to handle windows endings as well as unix ones.
Ahh that makes sense, as currently I'm deleting the newline by running .replace('\n', '')
which admittedly is not the best way to handle this...
I'll make a fix for this soon, thanks for finding this
Let me know if the latest :experimental
image fixes this problem
I'm away from my machine for the weekend, I'll try next week.
I'm having a similar issue, both with the :latest
and :experimental
images. If I attempt to import the CSV data shown below (just showing the first two lines here):
date,origin,destination,departure_time,arrival_time,arrival_date,duration,flight_number,notes
2017-04-27,KLAX,KMSY,6:55,12:15,2017-04-27,200,DL737,Some notes
The following error is logged:
Unidentifiable column name 'date', skipping column...
Detected columns: {'origin': 1, 'destination': 2, 'departure_time': 3, 'arrival_time': 4, 'arrival_date': 5, 'duration': 6, 'flight_number': 7, 'notes': 8}
[1] Failed to parse: 'Expected 8 entries, got 9'
[2] Failed to parse: 'Expected 8 entries, got 9'
[3] Failed to parse: 'Expected 8 entries, got 9'
Parsing process complete with 3 failures
If I put commas in front of each line of the CSV data (to create an empty column), date
is now identified but a similar error is logged:
Unidentifiable column name '', skipping column...
Detected columns: {'date': 1, 'origin': 2, 'destination': 3, 'departure_time': 4, 'arrival_time': 5, 'arrival_date': 6, 'duration': 7, 'flight_number': 8, 'notes': 9}
[1] Failed to parse: 'Expected 9 entries, got 10'
[2] Failed to parse: 'Expected 9 entries, got 10'
[3] Failed to parse: 'Expected 9 entries, got 10'
Parsing process complete with 3 failures
I've tried adjusting the newline characters with no change in outcome.
First of all, when importing a csv with an invalid column the next rows should have len(columns) - failed_columns
columns, which means ignoring the failed ones. Since that's a bug, i opened an issue for that (#39).
As for your actual problem, could you check for all unicode characters in the file through some website? What happens if you move the date
column to another position? Could it have to do with some character at the very start of the line?
I figured out the issue. The CSV file was saved from Excel as a CSV UTF-8 (Comma delimited) file. While trying countless other things, I noticed in VSCode (where I had the CSV file open) that it showed the file encoding as UTF-8 with BOM. I then explicitly saved the CSV file with UTF-8 encoding (no BOM) and the import worked as expected.
Even though it's not in the description, the CSV UTF-8 (Comma delimited) file type must have added that sequence of bytes at the start of the text stream and that was the cause of the import failure. Thanks for your help with this.
That's nice to hear. Since this issue seems to no longer be a problem, i'll close it. If anyone is still experiencing this, feel free to reopen.
I just tried to import data using a custom csv and got the following 2 errors:
Both of those column names look correct, is there something I'm doing wrong? The other 6 columns I'm using seem to work fine. The full first row of my csv is:
date,origin,destination,departure_time,arrival_time,arrival_date,distance,flight_number
I'm running via docker-compose, using pbogre/jetlog:latest