shahab-sarmashghi / RESPECT

Estimating repeat spectra and genome length from low-coverage genome skims
Other
11 stars 1 forks source link

some interesting warnings but looks like working? #5

Closed AntonioBaeza closed 3 years ago

AntonioBaeza commented 3 years ago

I tried the example you provided It looks like working but has some interesting warnings:

(RESPECT) [ant@hillary RESPECT]$ respect -d data/ -m data/name_mapping.txt -I data/hist_info.txt -N 10 --debug 2021-03-19 00:32:55,108 WARNING:data/name_mapping.txt does not have valid extension; it's skipped 2021-03-19 00:32:55,109 WARNING:data/hist_info.txt does not have valid extension; it's skipped 2021-03-19 00:32:55,116 INFO:Processing mp_fq... 2021-03-19 00:32:55,502 INFO:compute_kmer_histogram finished in 0.17826151847839355 seconds 2021-03-19 00:32:55,502 ERROR:Error occurred when processing /home/ant/anaconda3/envs/RESPECT/RESPECT/data/Micromonas_pusilla_cov_0.5_err_0.01.fq.gz; it's skipped Traceback (most recent call last): File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/respect_functions.py", line 246, in run_respect parameter_estimator.set_kmer_histogram(args.threads, args.decomp) File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/paramter_estimator.py", line 216, in set_kmer_histogram self.compute_kmer_histogram(n_threads, decomp_util) File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/timer.py", line 68, in wrapper_timer return func(*args, kwargs) File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/paramter_estimator.py", line 173, in compute_kmer_histogram profiler_output = kmer_profiler(self.input_file, self.sequence_type, self.output_name, self.tmp_dir, File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/profiling.py", line 112, in kmer_profiler call(["jellyfish", "count", "-m", str(kmer_length), "-s", "100M", "-t", str(n_threads), "-C", "-o", mercnt, File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/subprocess.py", line 349, in call with Popen(*popenargs, *kwargs) as p: File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/subprocess.py", line 951, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/subprocess.py", line 1823, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'jellyfish' 2021-03-19 00:32:55,506 INFO:Processing mp_hq... 2021-03-19 00:32:55,508 INFO:Processing mp_ha... 2021-03-19 00:32:55,510 INFO:Processing mp_fa... 2021-03-19 00:32:55,754 INFO:compute_kmer_histogram finished in 0.008790969848632812 seconds 2021-03-19 00:32:55,754 ERROR:Error occurred when processing /home/ant/anaconda3/envs/RESPECT/RESPECT/data/GCF_000151265.2_Micromonas_pusilla_CCMP1545_v2.0_genomic.fna; it's skipped Traceback (most recent call last): File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/respect_functions.py", line 246, in run_respect parameter_estimator.set_kmer_histogram(args.threads, args.decomp) File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/paramter_estimator.py", line 216, in set_kmer_histogram self.compute_kmer_histogram(n_threads, decomp_util) File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/timer.py", line 68, in wrapper_timer return func(args, kwargs) File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/paramter_estimator.py", line 173, in compute_kmer_histogram profiler_output = kmer_profiler(self.input_file, self.sequence_type, self.output_name, self.tmp_dir, File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/profiling.py", line 109, in kmer_profiler call(["jellyfish", "count", "-m", str(kmer_length), "-s", "100M", "-t", str(n_threads), "-o", mercnt, File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/subprocess.py", line 349, in call with Popen(*popenargs, **kwargs) as p: File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/subprocess.py", line 951, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/subprocess.py", line 1823, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'jellyfish' 2021-03-19 00:32:56,231 INFO:Starting iterations to estimate parameters of mp_hq Restricted license - for non-production use only - expires 2022-01-13 2021-03-19 00:32:56,670 INFO:Restricted license - for non-production use only - expires 2022-01-13 2021-03-19 00:33:04,609 INFO:estimate_genome_skim_parameters finished in 8.831506490707397 seconds 2021-03-19 00:33:04,642 INFO:Writing the results to the output files... (RESPECT) [ant@hillary RESPECT]$ ll total 56 drwxrwxr-x. 4 ant ant 4096 Mar 19 00:20 build drwxrwxr-x. 2 ant ant 4096 Mar 19 00:32 data drwxrwxr-x. 2 ant ant 4096 Mar 19 00:20 dist -rw-rw-r--. 1 ant ant 210 Mar 19 00:33 estimated-parameters.txt -rw-rw-r--. 1 ant ant 102 Mar 19 00:33 estimated-spectra.txt -rw-rw-r--. 1 ant ant 1462 Mar 19 00:19 LICENSE -rw-rw-r--. 1 ant ant 42 Mar 19 00:19 MANIFEST.in -rw-rw-r--. 1 ant ant 8671 Mar 19 00:19 README.md drwxrwxr-x. 4 ant ant 4096 Mar 19 00:20 respect drwxrwxr-x. 2 ant ant 4096 Mar 19 00:20 respect.egg-info -rw-rw-r--. 1 ant ant 1539 Mar 19 00:19 setup.py drwxrwxr-x. 6 ant ant 4096 Mar 19 00:32 tmp (RESPECT) [ant@hillary RESPECT]$

Looks like it did work it created two new txt files check if they have info:

FILE: estimated-parameters.txt

sample input_type sequence_type coverage genome_length uniqueness_ratio HCRM sequencing_error_rate mp_hq histogram genome-skim 0.58 18823131 1.00 70.52 0.0126 mp_ha histogram assembly NA 21690409 0.94 73.60 NA

FILE: estimated-spectra.txt

sample r1 r2 r3 r4 r5 mp_hq 18964990 201428 23901 10067 31786 mp_ha 20427099 216746 54799 19206 32470

Werner0 commented 3 years ago

(RESPECT) [ant@hillary RESPECT]$ respect -d data/ -m data/name_mapping.txt -I data/hist_info.txt -N 10 --debug

Looks like British code.

shahab-sarmashghi commented 3 years ago

Hi Antonio, Sorry I was very busy and missed this issue. It seems that only histogram inputs are processed and sequence input files are skipped from the output. The reason seems to be that jellyfish is not properly installed. You need to install jellyfish first and add its path to the system path (so you can run, e.g., jellyfish --version in the terminal without any problem). Please let me know if you encountered any problem in doing that or get other errors.

AntonioBaeza commented 3 years ago

thanks, I will give it a try

On Mon, Mar 29, 2021 at 2:39 PM Shahab Sarmashghi @.***> wrote:

Hi Antonio, Sorry I was very busy and missed this issue. It seems that only histogram inputs are processed and sequence input files are skipped from the output. The reason seems to be that jellyfish is not properly installed. You need to install jellyfish first and add its path to the system path (so you can run, e.g., jellyfish --version in the terminal without any problem). Please let me know if you encountered any problem in doing that or get other errors.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/shahab-sarmashghi/RESPECT/issues/5#issuecomment-809618381, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALIJRA62M7FMTW6RSVXMCI3TGDCO7ANCNFSM4ZOZFG3Q .

-- J. Antonio Baeza

Associate Professor Department of Biological Sciences, Clemson University South Carolina, USA & Research Associate Smithsonian Marine Station at Fort Pierce, Florida, USA & Adjunct Faculty Universidad Catolica del Norte, Coquimbo, Chile

Email: @. & @. Website: http://baezaantonio.wix.com/baezalabclemson Website (CI-team): https://baezaantonio.wixsite.com/clemsonmitogenomics Website ResearchGate: https://www.researchgate.net/profile/J_Baeza/ Website SemanticScholar: https://www.semanticscholar.org/author/Juan-Antonio-Baeza/144723920

shahab-sarmashghi commented 3 years ago

I have also found some time and worked on conda version of it, if this didn't work for you, soon you can easily install it that way. I will let you know once it is uploaded to bioconda.

jwasmuth commented 3 years ago

I get the same warnings as @AntonioBaeza 2021-05-25 16:29:08,106 WARNING:data/hist_info.txt does not have valid extension; it's skipped 2021-05-25 16:29:08,106 WARNING:data/name_mapping.txt does not have valid extension; it's skipped

shahab-sarmashghi commented 3 years ago

This is not an error. It just warns the user that some of the files under the input directory (provided using -d option) are not sequence files so will be skipped. You can safely ignore them.