metaDMG-dev / metaDMG-cpp

metaDMG-cpp
2 stars 2 forks source link

MismatchFileError with get_df_mismatches #8

Closed MinLuke closed 8 months ago

MinLuke commented 1 year ago

Hello everyone, I am currently getting an error after computing LCA wand bayesian, when compoting the mismatch. The error that I get is the following: `MismatchFileError with get_df_mismatches. See log-file for more information. Traceback (most recent call last): File "/opt/miniconda3/envs/metaDMG/lib/python3.9/site-packages/metaDMG/fit/serial.py", line 555, in run_single_c df_mismatches = get_df_mismatches(config, force=force)

                           File "/opt/miniconda3/envs/metaDMG/lib/python3.9/site-packages/metaDMG/fit/serial.py", line 393, in get_df_misma
                             df_mismatches = mismatches.compute(config)

                           File "/opt/miniconda3/envs/metaDMG/lib/python3.9/site-packages/metaDMG/fit/mismatches.py", line 223, in compute
                             raise MismatchFileError(f"{filename} only contains a header, no data.")
                         MismatchFileError: data/lca/Rip527_2.mismatches.txt.gz only contains a header, no data.

`

And the Log file is here reported: `[2023-02-13 12:47:02] | root:200 | DEBUG | _____New log started__ [2023-02-13 12:47:02] | root:201 | DEBUG | Log config file: /opt/miniconda3/envs/metaDMG/lib/python3.9/site-packages/metaDMG/loggers/log_config.yaml [2023-02-13 12:47:02] | root:161 | DEBUG | Logging server started! [2023-02-13 12:47:02] | metaDMG.loggers.loggers:40 | DEBUG | Running metaDMG version 0.38.0. [2023-02-13 12:47:02] | metaDMG.loggers.loggers:41 | DEBUG | Using port 51601 for logging. [2023-02-13 12:47:02] | metaDMG.utils:160 | INFO | Using config.yaml as config file. [2023-02-13 12:47:02] | metaDMG.fit.workflow:27 | INFO | Running metaDMG on 1 files in total. [2023-02-13 12:47:02] | metaDMG.fit.workflow:34 | INFO | Running the samples in serial (sequentially), each using 1 core(s). [2023-02-13 12:47:02] | metaDMG.fit.serial:290 | INFO | Rip527_2 | Getting LCA. [2023-02-13 12:47:02] | metaDMG.fit.serial:310 | INFO | Rip527_2 | LCA has to be computed. This can take a while, please wait. [2023-02-13 12:47:02] | metaDMG.fit.serial:317 | DEBUG | Rip527_2 | ../../metaDMG-cpp/metaDMG-cpp lca -bam Rip527_2.sorted.bam -outnames data/tmp/Rip527_2/Rip527_2 -names ../reference/names.dmp -nodes ../reference/nodes.dmp -acc2tax ../reference/combined_taxid_accssionNO_20223103 -simscorelow 0.95 -simscorehigh 1.0 -minmapq 0 -howmany 15 -weighttype 1 -fix_ncbi 0 -tempfolder data/tmp/Rip527_2/ [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> metaDMG version: 0.2-46-gedaf069 (htslib: 1.16) build(Jan 25 2023 15:55:35) [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | #../../metaDMG-cpp/metaDMG-cpp lca -bam Rip527_2.sorted.bam -outnames data/tmp/Rip527_2/Rip527_2 -names ../reference/names.dmp -nodes ../reference/nodes.dmp -acc2tax ../reference/combined_taxid_accssionNO_20223103 -simscorelow 0.95 -simscorehigh 1.0 -minmapq 0 -howmany 15 -weighttype 1 -fix_ncbi 0 -tempfolder data/tmp/Rip527_2/ [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Will output lca results in file: 'data/tmp/Rip527_2/Rip527_2.lca.gz' [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> [thread1] Will read header [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Will output lca distribution in file: 'data/tmp/Rip527_2/Rip527_2.stat' [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Will output lca weight in file: 'data/tmp/Rip527_2/Rip527_2.wlca' [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Will output log info (problems) in file: 'data/tmp/Rip527_2/Rip527_2.log' [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> [thread1] Done reading header: 0.00 sec, header contains: 494760 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -bam Rip527_2.sorted.bam [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -names ../reference/names.dmp [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -nodes ../reference/nodes.dmp [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -acc2tax ../reference/combined_taxid_accssionNO_20223103 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -simscoreLow 0.950000 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -simscoreHigh 1.000000 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -editdistMin 0 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -editdistMax 10 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -outnames data/tmp/Rip527_2/Rip527_2 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -minmapq 0 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -lca_rank species [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -norank2species 0 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -howmany 15 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -fix_ncbi 0 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -weighttype 1 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -tempfolder -215122080 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> -stopIfErrors 1 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Starting to extract (acc->taxid) from binary file: '../reference/combined_taxid_accssionNO_20223103' [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Checking if exits: 'data/tmp/Rip527_2/combined_taxid_accssionNO_20223103Rip527_2.sorted.bam.bin' [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Checking if bimnary file exists. dodump=1 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> opening file: 'data/tmp/Rip527_2/combined_taxid_accssionNO_20223103Rip527_2.sorted.bam.bin' mode: 'wb' [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Setting threads to: 4 [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> opening file: '../reference/combined_taxid_accssionNO_20223103' mode: 'rb' [2023-02-13 12:47:02] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Setting threads to: 2 [2023-02-13 12:50:44] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Number of entries to use from accesion to taxid: 342949, time taken: 222.00 sec [2023-02-13 12:50:48] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> [../reference/names.dmp] Number of unique names (column1): 2293592 with third column 'scientific name' [2023-02-13 12:50:53] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Number of unique names (column1): 2293592 from file: ../reference/nodes.dmp parent.size():2293592 child.size():0 [2023-02-13 12:50:53] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Number of entries with level information: 48 [2023-02-13 12:50:53] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Problem finding level for rank: serotype [2023-02-13 12:50:53] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Problem finding level for rank: serotype [2023-02-13 12:50:53] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Problem finding level for rank: serotype [2023-02-13 12:50:53] | metaDMG.fit.serial:278 | DEBUG | Rip527_2 | ... [2023-02-13 12:50:53] | metaDMG.fit.serial:279 | DEBUG | Rip527_2 | ... [2023-02-13 12:50:53] | metaDMG.fit.serial:280 | DEBUG | Rip527_2 | ... [2023-02-13 12:50:55] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | [hts] -> editMin:0 editmMax:10 scoreLow:0.950000 scoreHigh:1.000000 minlength:-1 discard: 516 prefix: data/tmp/Rip527_2/Rip527_2 howmany: 15 skipnorank: 1 weighttype: 1 [2023-02-13 12:50:55] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Will dump: 'data/tmp/Rip527_2/Rip527_2.bdamage.gz' this contains damage patterns for: 0 items [2023-02-13 12:50:55] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Setting threads to: 4 [2023-02-13 12:50:55] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Number of species with reads that map uniquely: 0 [2023-02-13 12:50:55] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> [ALL done] walltime used = 233.00 sec [2023-02-13 12:50:56] | metaDMG.fit.serial:248 | DEBUG | Rip527_2 | Hid the following lines: {'\t-> Problem finding level for rank: serotype': 1216}. [2023-02-13 12:50:56] | metaDMG.fit.serial:249 | DEBUG | Rip527_2 | Got return code 0 from ../../metaDMG-cpp/metaDMG-cpp lca -bam Rip527_2.sorted.bam -outnames data/tmp/Rip527_2/Rip527_2 -names ../reference/names.dmp -nodes ../reference/nodes.dmp -acc2tax ../reference/combined_taxid_accssionNO_20223103 -simscorelow 0.95 -simscorehigh 1.0 -minmapq 0 -howmany 15 -weighttype 1 -fix_ncbi 0 -tempfolder data/tmp/Rip527_2/. [2023-02-13 12:50:56] | metaDMG.fit.serial:320 | DEBUG | Rip527_2 | ../../metaDMG-cpp/metaDMG-cpp print_ugly data/tmp/Rip527_2/Rip527_2.bdamage.gz -names ../reference/names.dmp -nodes ../reference/nodes.dmp -lcastat data/tmp/Rip527_2/Rip527_2.stat [2023-02-13 12:50:56] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> metaDMG version: 0.2-46-gedaf069 (htslib: 1.16) build(Jan 25 2023 15:55:35) [2023-02-13 12:50:56] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | #../../metaDMG-cpp/metaDMG-cpp print_ugly data/tmp/Rip527_2/Rip527_2.bdamage.gz -names ../reference/names.dmp -nodes ../reference/nodes.dmp -lcastat data/tmp/Rip527_2/Rip527_2.stat [2023-02-13 12:50:56] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | ./metaDMG-cpp print_ugly file.bdamage.gz -names file.gz -nodes trestructure.gz -lcastat fil.gz [2023-02-13 12:50:56] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | infile_names: ../reference/names.dmp infile_bdamage: data/tmp/Rip527_2/Rip527_2.bdamage.gz nodes: ../reference/nodes.dmp lca_stat: data/tmp/Rip527_2/Rip527_2.stat infile_bam: (null)#VERSION:0.2-46-gedaf069 [2023-02-13 12:50:56] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Dumping file: 'data/tmp/Rip527_2/Rip527_2.bdamage.gz.uglyprint.mismatch.txt.gz' [2023-02-13 12:51:01] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Number of unique names (column1): 2293592 from file: ../reference/nodes.dmp parent.size():2293592 child.size():199230 [2023-02-13 12:51:01] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Done loading binary bdamage.gz file. It contains: 0 [2023-02-13 12:51:01] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Number of entries in damage pattern file: 0 printlength(howmany):15 [2023-02-13 12:51:04] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> [../reference/names.dmp] Number of unique names (column1): 2293592 with third column 'scientific name' [2023-02-13 12:51:16] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> pre: 0.000000 post:2293592.000000 grownbyfactor: inf [2023-02-13 12:51:16] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Dumping file: 'data/tmp/Rip527_2/Rip527_2.bdamage.gz.uglyprint.stat.txt.gz' [2023-02-13 12:51:16] | metaDMG.fit.serial:273 | DEBUG | Rip527_2 | -> Done loading lcastat file It contains: 0 [2023-02-13 12:51:25] | metaDMG.fit.serial:249 | DEBUG | Rip527_2 | Got return code 0 from ../../metaDMG-cpp/metaDMG-cpp print_ugly data/tmp/Rip527_2/Rip527_2.bdamage.gz -names ../reference/names.dmp -nodes ../reference/nodes.dmp -lcastat data/tmp/Rip527_2/Rip527_2.stat. [2023-02-13 12:51:25] | metaDMG.fit.serial:176 | DEBUG | Rip527_2 | Moving data/tmp/Rip527_2/Rip527_2.bdamage.gz.uglyprint.mismatch.txt.gz to data/lca/Rip527_2.mismatches.txt.gz. [2023-02-13 12:51:25] | metaDMG.fit.serial:176 | DEBUG | Rip527_2 | Moving data/tmp/Rip527_2/Rip527_2.bdamage.gz.uglyprint.stat.txt.gz to data/lca/Rip527_2.mismatches.stat.txt.gz. [2023-02-13 12:51:25] | metaDMG.fit.serial:176 | DEBUG | Rip527_2 | Moving data/tmp/Rip527_2/Rip527_2.lca.gz to data/lca/Rip527_2.lca.txt.gz. [2023-02-13 12:51:25] | metaDMG.fit.serial:176 | DEBUG | Rip527_2 | Moving data/tmp/Rip527_2/Rip527_2.log to data/lca/Rip527_2.log.txt. [2023-02-13 12:51:25] | metaDMG.fit.serial:392 | INFO | Rip527_2 | Computing mismatch matrix dataframes. [2023-02-13 12:51:25] | root:237 | ERROR | Rip527_2 | MismatchFileError with get_df_mismatches. See log-file for more information. Traceback (most recent call last): File "/opt/miniconda3/envs/metaDMG/lib/python3.9/site-packages/metaDMG/fit/serial.py", line 555, in run_single_config df_mismatches = get_df_mismatches(config, force=force)

File "/opt/miniconda3/envs/metaDMG/lib/python3.9/site-packages/metaDMG/fit/serial.py", line 393, in get_df_mismatches df_mismatches = mismatches.compute(config)

File "/opt/miniconda3/envs/metaDMG/lib/python3.9/site-packages/metaDMG/fit/mismatches.py", line 223, in compute raise MismatchFileError(f"{filename} only contains a header, no data.") MismatchFileError: data/lca/Rip527_2.mismatches.txt.gz only contains a header, no data.

[2023-02-13 12:51:25] | metaDMG.fit.workflow:50 | ERROR | 1 error(s) occurred during the computation. ` What does this means? How can I resolve it? Thank you

ANGSD commented 8 months ago

Recent versions of the program is implemented in c/c++ and that should resolve these observed issues.