Closed henninghe closed 9 months ago
I have no issues either opening the csv in a typical file reader nor issues reading the specific file into pandas.
import pandas as pd
injuries = pd.read_csv('https://github.com/nflverse/nflverse-data/releases/download/injuries/injuries_2023.csv')
injuries.iloc[24]
#> season 2023
game_type REG
team CHI
week 1
gsis_id 00-0033757
position TE
full_name Robert Tonyan
first_name Robert
last_name Tonyan
report_primary_injury Back
report_secondary_injury NaN
report_status Questionable
practice_primary_injury NaN
practice_secondary_injury NaN
practice_status \n
date_modified 2023-09-09T21:52:37Z
Name: 24, dtype: object
I see the new line character, but do not see why it would cause an issue loading the csv into python. Can you please give me a reproducible example illustrating your issue with loading this file into python? You can use the reprexpy
package in python to create reprexes.
Hi John, thanks for the fast reply. I made a mistake at my end. I used the wrong file path 'https://github.com/nflverse/nflverse-data/releases/tag/injuries/injuries_2023.csv' instead of 'https://github.com/nflverse/nflverse-data/releases/download/injuries/injuries_2023.csv' which lead to a "ParserError: Error tokenizing data." when trying to read the file with pandas.
I then downloaded the file manually and checked for potential issues and stumbled across the extra line feeds. After fixing them, I was able to load the file and assumed this was the root cause. Would have been a smart move to check reading the local file without any fixes first.
Sorry for occupying Your time on this and thanks for the response. Your example helped me to figure out the real issue at my end.
I love Your great efforts on this project! Regards Henning
Is there an existing issue for this?
Have you installed the latest development version of the package(s) in question?
What version of the package do you have?
not relevant, I just want to use the downloaded csv
Describe the bug
Within the csv files for injuries there are unwanted line feeds in column "practice_status" for line items where this field is empty. This leads to issues when loading the file e.g. with pandas.read_csv.
Example file: https://github.com/nflverse/nflverse-data/releases/tag/injuries/injuries_2023.csv Line Item 26 "Robert Tonyan"
Reprex
Expected Behavior
I would expect that there are no line feeds withing "cells" of a csv file whatsoever.
nflverse_sitrep
Screenshots
No response
Additional context
No response