qlu-lab / SUPERGNOVA

MIT License
23 stars 10 forks source link

Problems with error: "pandas.errors.ParserError: Error tokenizing data" #11

Open JKonzok opened 1 year ago

JKonzok commented 1 year ago

Dear all,

I am an absolute beginner in python as well as conducting analyses with SUPERGNOVA and have heavy problems to solve the following error.

SUPERGNOVA % python3 supergnova.py ./data/sumstats/Inter.txt ./data/sumstats/AccPA.txt \ --N1 102837 \ --N2 91084 \ --bfile data/bfiles/eur_chr@_SNPmaf5 \ --partition data/partition/eur_chr@.bed \ --out results.txt

Preparing files for analysis... Traceback (most recent call last): File "/Users//SUPERGNOVA/supergnova.py", line 93, in pipeline(parser.parse_args()) File "/Users//SUPERGNOVA/supergnova.py", line 54, in pipeline gwas_snps, bed, N1, N2 = prep(args.bfile, args.partition, args.sumstats1, args.sumstats2, args.N1, args.N2) File "/Users//SUPERGNOVA/prep.py", line 65, in prep dfs = [pd.read_csv(file, delim_whitespace=True) File "/Users//SUPERGNOVA/prep.py", line 65, in dfs = [pd.read_csv(file, delim_whitespace=True) File "/Users//anaconda3/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper return func(args, kwargs) File "/Users//anaconda3/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper return func(args, kwargs) File "/Users//anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv return _read(filepath_or_buffer, kwds) File "/Users//anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read return parser.read(nrows) File "/Users//anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1778, in read ) = self._engine.read( # type: ignore[attr-defined] File "/Users//anaconda3/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read chunks = self._reader.read_low_memory(nrows) File "pandas/_libs/parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory File "pandas/_libs/parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas/_libs/parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 21 fields in line 2771, saw 28

Has someone any idea to solve this error? Thanks in advance for your help.

theopoliss commented 8 months ago

It looks like there is a difference between the number of columns in the actual data of one of your GWAS summary statistic files versus the number of columns in the expected format. There may be extra columns or separators in line 2771 of one of your files that do not match the header line.