kordk / torch-ecpg

(GPU accelerated) eCpG mapper
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

mlr --cis error: KeyError: "None of ['name'] are in the columns" #33

Closed rituroy closed 1 year ago

rituroy commented 1 year ago

pwd /songlab/proj/cbi/torch_ecpg/simulation

head -n6 annot/G.bed6 chrom,chromStart,chromEnd,name,score,strand 7,20180663,20180712,ILMN_1762337,0,- 10,52566587,52566636,ILMN_2383229,0,- 10,52566496,52566545,ILMN_1806310,0,- 10,52610470,52610519,ILMN_1779670,0,- 22,16256562,16256611,ILMN_1717783,0,-

head -n6 annot/M.bed6 chrom,chromStart,chromEnd,name,score,strand 16,53468112,53468112,cg00000029,0,+ 1,91194674,91194674,cg00000165,0,- 8,42263294,42263294,cg00000236,0,- 14,69341139,69341139,cg00000289,0,+ 16,28890100,28890100,cg00000292,0,+

tecpg run mlr --cis [INFO] CUDA GPU detected. This device supports CUDA. [INFO] Reading 3 dataframes... [INFOTIMER] Reading 1/3: C.csv [INFO] Reading csv file /data/songlab/proj/cbi/torch_ecpg/simulation/data/C.csv with separator , [INFOTIMER] Read 1/3 in 0.0031 seconds [INFOTIMER] Reading 2/3: M.csv [INFO] Reading csv file /data/songlab/proj/cbi/torch_ecpg/simulation/data/M.csv with separator , [INFOTIMER] Read 2/3 in 18.831 seconds [INFOTIMER] Reading 3/3: G.csv [INFO] Reading csv file /data/songlab/proj/cbi/torch_ecpg/simulation/data/G.csv with separator , [INFOTIMER] Read 3/3 in 1.5278 seconds [INFOTIMER] Finished reading 3 dataframes in 20.3622 seconds. Traceback (most recent call last): File "/home/ritu/anaconda3/envs/py310/bin/tecpg", line 33, in sys.exit(load_entry_point('tecpg', 'console_scripts', 'tecpg')()) File "/home/ritu/anaconda3/envs/py310/bin/tecpg", line 25, in importlib_load_entry_point return next(matches).load() File "/home/ritu/anaconda3/envs/py310/lib/python3.10/importlib/metadata/init.py", line 171, in load module = import_module(match.group('module')) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/data/songlab/proj/cbi/torch_ecpg/torch-ecpg/tecpg/main.py", line 9, in main() File "/data/songlab/proj/cbi/torch_ecpg/torch-ecpg/tecpg/main.py", line 6, in main start() File "/data/songlab/proj/cbi/torch_ecpg/torch-ecpg/tecpg/cli.py", line 752, in start cli(obj={}) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/core.py", line 760, in invoke return __callback(args, kwargs) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func return f(get_current_context(), *args, *kwargs) File "/data/songlab/proj/cbi/torch_ecpg/torch-ecpg/tecpg/cli.py", line 276, in mlr ).set_index('name') File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper return func(args, kwargs) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/pandas/core/frame.py", line 6001, in set_index raise KeyError(f"None of {missing} are in the columns") KeyError: "None of ['name'] are in the columns"

liamgd commented 1 year ago

MLR reads the annotation files as tab-separated values. I believe this is the standard. It is mentioned that tabs and spaces are used as separators for bed6 files in https://genome.ucsc.edu/FAQ/FAQformat.html#format1 and https://en.wikipedia.org/wiki/BED_(file_format) as commas are reserved for a different blockSizes and blockStarts.

As of 33ca242, pandas will infer the separator that is used in the bed6 file, so this should be fixed.