iqbal-lab-org / clockwork

CRyPTIC data processing pipelines
MIT License
31 stars 22 forks source link

Issue while reference_prepare #84

Closed carolynzy closed 3 years ago

carolynzy commented 3 years ago

Hi, I'm trying to prepare the reference while encountered the following issue:

$singularity exec ../singularity/clockwork_container.img clockwork reference_prepare --contam_tsv reference_info.tsv --outdir OUT references.fasta [2021-03-30T08:38:23 - clockwork reference_prepare - INFO] Run command: seqtk seq -C -l 60 references.fasta > /home/carolyn/clockwork/ref_data/OUT/ref.fa [2021-03-30T08:38:33 - clockwork reference_prepare - INFO] Return code: 0 [2021-03-30T08:38:33 - clockwork reference_prepare - INFO] stdout: [2021-03-30T08:38:33 - clockwork reference_prepare - INFO] stderr: [2021-03-30T08:38:33 - clockwork reference_prepare - INFO] Run command: samtools faidx /home/carolyn/clockwork/ref_data/OUT/ref.fa [2021-03-30T08:38:50 - clockwork reference_prepare - INFO] Return code: 0 [2021-03-30T08:38:50 - clockwork reference_prepare - INFO] stdout: [2021-03-30T08:38:50 - clockwork reference_prepare - INFO] stderr: [2021-03-30T08:38:50 - clockwork reference_prepare - INFO] Run command: bwa index -a bwtsw /home/carolyn/clockwork/ref_data/OUT/ref.fa [2021-03-30T10:02:41 - clockwork reference_prepare - INFO] Return code: 0 [2021-03-30T10:02:41 - clockwork reference_prepare - INFO] stdout: [bwt_gen] Finished constructing BWT in 882 iterations. [2021-03-30T10:02:41 - clockwork reference_prepare - INFO] stderr: [bwa_index] Pack FASTA... 30.88 sec [bwa_index] Construct BWT for the packed sequence... [BWTIncCreate] textLength=8140015626, availableWord=584761636 [BWTIncConstructFromPacked] 10 iterations done. 99999994 characters processed. [BWTIncConstructFromPacked] 20 iterations done. 199999994 characters processed. [BWTIncConstructFromPacked] 30 iterations done. 299999994 characters processed. [BWTIncConstructFromPacked] 40 iterations done. 399999994 characters processed. [BWTIncConstructFromPacked] 50 iterations done. 499999994 characters processed. [BWTIncConstructFromPacked] 60 iterations done. 599999994 characters processed. [BWTIncConstructFromPacked] 70 iterations done. 699999994 characters processed. [BWTIncConstructFromPacked] 80 iterations done. 799999994 characters processed. [BWTIncConstructFromPacked] 90 iterations done. 899999994 characters processed. [BWTIncConstructFromPacked] 100 iterations done. 999999994 characters processed. [BWTIncConstructFromPacked] 110 iterations done. 1099999994 characters processed. [BWTIncConstructFromPacked] 120 iterations done. 1199999994 characters processed. [BWTIncConstructFromPacked] 130 iterations done. 1299999994 characters processed. [BWTIncConstructFromPacked] 140 iterations done. 1399999994 characters processed. [BWTIncConstructFromPacked] 150 iterations done. 1499999994 characters processed. [BWTIncConstructFromPacked] 160 iterations done. 1599999994 characters processed. [BWTIncConstructFromPacked] 170 iterations done. 1699999994 characters processed. [BWTIncConstructFromPacked] 180 iterations done. 1799999994 characters processed. [BWTIncConstructFromPacked] 190 iterations done. 1899999994 characters processed. [BWTIncConstructFromPacked] 200 iterations done. 1999999994 characters processed. [BWTIncConstructFromPacked] 210 iterations done. 2099999994 characters processed. [BWTIncConstructFromPacked] 220 iterations done. 2199999994 characters processed. [BWTIncConstructFromPacked] 230 iterations done. 2299999994 characters processed. [BWTIncConstructFromPacked] 240 iterations done. 2399999994 characters processed. [BWTIncConstructFromPacked] 250 iterations done. 2499999994 characters processed. [BWTIncConstructFromPacked] 260 iterations done. 2599999994 characters processed. [BWTIncConstructFromPacked] 270 iterations done. 2699999994 characters processed. [BWTIncConstructFromPacked] 280 iterations done. 2799999994 characters processed. [BWTIncConstructFromPacked] 290 iterations done. 2899999994 characters processed. [BWTIncConstructFromPacked] 300 iterations done. 2999999994 characters processed. [BWTIncConstructFromPacked] 310 iterations done. 3099999994 characters processed. [BWTIncConstructFromPacked] 320 iterations done. 3199999994 characters processed. [BWTIncConstructFromPacked] 330 iterations done. 3299999994 characters processed. [BWTIncConstructFromPacked] 340 iterations done. 3399999994 characters processed. [BWTIncConstructFromPacked] 350 iterations done. 3499999994 characters processed. [BWTIncConstructFromPacked] 360 iterations done. 3599999994 characters processed. [BWTIncConstructFromPacked] 370 iterations done. 3699999994 characters processed. [BWTIncConstructFromPacked] 380 iterations done. 3799999994 characters processed. [BWTIncConstructFromPacked] 390 iterations done. 3899999994 characters processed. [BWTIncConstructFromPacked] 400 iterations done. 3999999994 characters processed. [BWTIncConstructFromPacked] 410 iterations done. 4099999994 characters processed. [BWTIncConstructFromPacked] 420 iterations done. 4199999994 characters processed. [BWTIncConstructFromPacked] 430 iterations done. 4299999994 characters processed. [BWTIncConstructFromPacked] 440 iterations done. 4399999994 characters processed. [BWTIncConstructFromPacked] 450 iterations done. 4499999994 characters processed. [BWTIncConstructFromPacked] 460 iterations done. 4599999994 characters processed. [BWTIncConstructFromPacked] 470 iterations done. 4699999994 characters processed. [BWTIncConstructFromPacked] 480 iterations done. 4799999994 characters processed. [BWTIncConstructFromPacked] 490 iterations done. 4899999994 characters processed. [BWTIncConstructFromPacked] 500 iterations done. 4999999994 characters processed. [BWTIncConstructFromPacked] 510 iterations done. 5099999994 characters processed. [BWTIncConstructFromPacked] 520 iterations done. 5199999994 characters processed. [BWTIncConstructFromPacked] 530 iterations done. 5299999994 characters processed. [BWTIncConstructFromPacked] 540 iterations done. 5399999994 characters processed. [BWTIncConstructFromPacked] 550 iterations done. 5499999994 characters processed. [BWTIncConstructFromPacked] 560 iterations done. 5599999994 characters processed. [BWTIncConstructFromPacked] 570 iterations done. 5699999994 characters processed. [BWTIncConstructFromPacked] 580 iterations done. 5799999994 characters processed. [BWTIncConstructFromPacked] 590 iterations done. 5899999994 characters processed. [BWTIncConstructFromPacked] 600 iterations done. 5999999994 characters processed. [BWTIncConstructFromPacked] 610 iterations done. 6099999994 characters processed. [BWTIncConstructFromPacked] 620 iterations done. 6199999994 characters processed. [BWTIncConstructFromPacked] 630 iterations done. 6299999994 characters processed. [BWTIncConstructFromPacked] 640 iterations done. 6399999994 characters processed. [BWTIncConstructFromPacked] 650 iterations done. 6499999994 characters processed. [BWTIncConstructFromPacked] 660 iterations done. 6599999994 characters processed. [BWTIncConstructFromPacked] 670 iterations done. 6699999994 characters processed. [BWTIncConstructFromPacked] 680 iterations done. 6799999994 characters processed. [BWTIncConstructFromPacked] 690 iterations done. 6899999994 characters processed. [BWTIncConstructFromPacked] 700 iterations done. 6999999994 characters processed. [BWTIncConstructFromPacked] 710 iterations done. 7099999994 characters processed. [BWTIncConstructFromPacked] 720 iterations done. 7199999994 characters processed. [BWTIncConstructFromPacked] 730 iterations done. 7299999994 characters processed. [BWTIncConstructFromPacked] 740 iterations done. 7399999994 characters processed. [BWTIncConstructFromPacked] 750 iterations done. 7499321674 characters processed. [BWTIncConstructFromPacked] 760 iterations done. 7589896394 characters processed. [BWTIncConstructFromPacked] 770 iterations done. 7670395434 characters processed. [BWTIncConstructFromPacked] 780 iterations done. 7741939178 characters processed. [BWTIncConstructFromPacked] 790 iterations done. 7805523434 characters processed. [BWTIncConstructFromPacked] 800 iterations done. 7862033258 characters processed. [BWTIncConstructFromPacked] 810 iterations done. 7912255322 characters processed. [BWTIncConstructFromPacked] 820 iterations done. 7956888826 characters processed. [BWTIncConstructFromPacked] 830 iterations done. 7996555162 characters processed. [BWTIncConstructFromPacked] 840 iterations done. 8031806682 characters processed. [BWTIncConstructFromPacked] 850 iterations done. 8063134298 characters processed. [BWTIncConstructFromPacked] 860 iterations done. 8090974314 characters processed. [BWTIncConstructFromPacked] 870 iterations done. 8115714554 characters processed. [BWTIncConstructFromPacked] 880 iterations done. 8137699706 characters processed. [bwa_index] 3524.45 seconds elapse. [bwa_index] Update BWT... 23.89 sec [bwa_index] Pack forward-only FASTA... 22.18 sec [bwa_index] Construct SA from BWT and Occ... 1384.14 sec [main] Version: 0.7.15-r1140 [main] CMD: bwa index -a bwtsw /home/carolyn/clockwork/ref_data/OUT/ref.fa [main] Real time: 5030.994 sec; CPU: 4985.536 sec [2021-03-30T10:02:41 - clockwork reference_prepare - INFO] Run command: rsync reference_info.tsv /home/carolyn/clockwork/ref_data/OUT/remove_contam_metadata.tsv [2021-03-30T10:02:41 - clockwork reference_prepare - INFO] Return code: 0 [2021-03-30T10:02:41 - clockwork reference_prepare - INFO] stdout: [2021-03-30T10:02:41 - clockwork reference_prepare - INFO] stderr: Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/clockwork/contam_remover.py", line 71, in _load_metadata_file ValueError: not enough values to unpack (expected at least 2, got 1)

During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/bin/clockwork", line 4, in import('pkg_resources').run_script('clockwork==0.9.0', 'clockwork') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 658, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1445, in run_script exec(script_code, namespace, namespace) File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/EGG-INFO/scripts/clockwork", line 960, in File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/clockwork/tasks/reference_prepare.py", line 47, in run File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/clockwork/reference_dir.py", line 73, in add_remove_contam_metadata_tsv File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/clockwork/contam_remover.py", line 73, in _load_metadata_file clockwork.contam_remover.Error: Error parsing line: Virus 1 NC_001802.1

I couldn't upload the references.fasta file since it's too large (3.9G) but it could be donwloaded here: https://1drv.ms/u/s!Ahu3aHGoa85BhpgJ54IOPupzotYjFA?e=MyObb2 (md5sum 0c3d0d79c6d5fa163423cfbea91917ad)

And the tsv file is the this one:

reference_info.tsv.tar.gz

I used the latest version of genome sequences as I could find and made the tsv file following to the instructions from this link:https://github.com/iqbal-lab-org/clockwork/wiki/Preparing-remove-contamination-reference-data. Did I do something wrong?Thank you in advance!

martinghunt commented 3 years ago

The line that throws the error has spaces not tabs:

$ awk -F"\t" 'NF!=3' reference_info.tsv | cat -A
Virus 1 NC_001802.1$

Should work ok if you change the spaces to tabs.

carolynzy commented 3 years ago

Thank you, Marting! That indeed is the problem. I have no idea how it got there, since other lines are in good shape.