ziyewang / COMEBin

GNU General Public License v3.0
36 stars 5 forks source link

Error with gen_cov.py #20

Open agavriilidou opened 4 months ago

agavriilidou commented 4 months ago

Hi,

I tried to run comebin like this: run_comebin.sh -a /projects/0/prjs0956/assembly/megahit/binned/SB_DM_F5B_0-2_II_assembly/final.contigs.fasta \ -p /projects/0/prjs0956/mapping/ \ -o /projects/0/prjs0956/binning/comebin_out/ \ -t 64

but I got this error

`Traceback (most recent call last): File "/gpfs/work4/0/prjs0956/tools/miniconda3/envs/comebin_env/bin/COMEBin/data_aug/gen_cov.py", line 24, in call result = self.__callable(*args, **kwargs) File "/gpfs/work4/0/prjs0956/tools/miniconda3/envs/comebin_env/bin/COMEBin/data_aug/gen_cov.py", line 129, in calculate_coverage_samplebyindex start = aug_seq_info_dict[contig_name][0] KeyError: 'NODE_1_length_355648_cov_13.388902'

Traceback (most recent call last): File "main.py", line 369, in main() File "main.py", line 353, in main run_gen_cov(logger, args) File "/gpfs/work4/0/prjs0956/tools/miniconda3/envs/comebin_env/bin/COMEBin/data_aug/gen_cov.py", line 302, in run_gen_cov gen_cov_from_bedout(logger, args.out_augdata_path, out, num_aug=args.n_views-1, contig_len=args.contig_len,num_process=args.num_threads) File "/gpfs/work4/0/prjs0956/tools/miniconda3/envs/comebin_env/bin/COMEBin/data_aug/gen_cov.py", line 270, in gen_cov_from_bedout res_mat = pd.read_csv(cov_file, sep='\t', header=0, index_col=0) File "/projects/0/prjs0956/tools/miniconda3/envs/comebin_env/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, kwargs) File "/projects/0/prjs0956/tools/miniconda3/envs/comebin_env/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv return _read(filepath_or_buffer, kwds) File "/projects/0/prjs0956/tools/miniconda3/envs/comebin_env/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/projects/0/prjs0956/tools/miniconda3/envs/comebin_env/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in init self._engine = self._make_engine(self.engine) File "/projects/0/prjs0956/tools/miniconda3/envs/comebin_env/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine return mapping[engine](self.f, **self.options) # type: ignore[call-arg] File "/projects/0/prjs0956/tools/miniconda3/envs/comebin_env/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 51, in init self._open_handles(src, kwds) File "/projects/0/prjs0956/tools/miniconda3/envs/comebin_env/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 229, in _open_handles errors=kwds.get("encoding_errors", "strict"), File "/projects/0/prjs0956/tools/miniconda3/envs/comebin_env/lib/python3.7/site-packages/pandas/io/common.py", line 707, in get_handle newline="", FileNotFoundError: [Errno 2] No such file or directory: '/gpfs/work4/0/prjs0956/binning/comebin_out/data_augmentation/depth/SB_DM_F5B_0-2_II.bam_0_depth.txt_aug1_data_cov.csv' `

Any ideas why?

thanks! Menia

ziyewang commented 4 months ago

Hi, Menia,

Apologies for the delayed response. Could you confirm if there are any files in the following folder: '/gpfs/work4/0/prjs0956/binning/comebin_out/data_augmentation/depth'? The files in question should have names ending with '_depth.txt' and should contain four columns, with the first column being contig names.

Best, Ziye

agavriilidou commented 4 months ago

Hi Ziye,

There are two files in the '/depth' dir. A '_depth.txt' with 4 columns (1st is contig names) and a '_aug0_data_cov.csv'. Here is the log file if it helps:

2024-04-18 16:15:59,926 - generate_aug_data: fastafile 2024-04-18 16:20:22,081 - aug: 1 2024-04-18 16:21:07,702 - aug: 2 2024-04-18 16:21:53,198 - aug: 3 2024-04-18 16:22:38,755 - aug: 4 2024-04-18 16:23:24,945 - aug: 5 2024-04-18 16:24:10,399 - Generate coverage files from bam files. 2024-04-18 16:24:11,682 - Processing/gpfs/work4/0/prjs0956/mapping/SB_DM_F5B_0-2_II.bam 2024-04-18 16:29:42,292 - Processed:/gpfs/work4/0/prjs0956/mapping/SB_DM_F5B_0-2_II.bam 2024-04-18 16:31:24,154 - Processed:/gpfs/work4/0/prjs0956/binning/comebin_out/data_augmentation/depth/SB_DM_F5B_0-2_II.bam_0_depth.txt Thanks Menia

ziyewang commented 4 months ago

Hi,

This issue may have occurred because some keys, such as "NODE_1_length_355648_cov_13.388902", in the BAM files are not present in the FASTA files. The "_aug0_data_cov.csv" file provides the coverage profile obtained from the BAM files. Could you please verify the consistency between the sequence names in the "_aug0_data_cov.csv" file and the FASTA file?

Best, Ziye