Open PovilasMat opened 1 year ago
Hi, I have the same issue, please help!!
ARIBA version: 2.14.6
External dependencies: bowtie2 2.3.4.1 /usr/bin/bowtie2 cdhit 4.7 /usr/bin/cd-hit-est nucmer 3.1 /usr/bin/nucmer spades 3.13.0 /home/inei/SPAdes-3.13.0-Linux/bin/spades.py
External dependencies OK: True
Python version: 3.6.9 (default, Mar 10 2023, 16:46:00) [GCC 8.4.0]
Python packages: ariba 2.14.6 /usr/local/lib/python3.6/dist-packages/ariba/init.py bs4 4.9.2 /home/inei/.local/lib/python3.6/site-packages/bs4/init.py dendropy 4.4.0 /home/inei/.local/lib/python3.6/site-packages/dendropy/init.py pyfastaq 3.17.0 /home/inei/.local/lib/python3.6/site-packages/pyfastaq/init.py pymummer 0.10.3 /home/inei/.local/lib/python3.6/site-packages/pymummer/init.py pysam 0.16.0.1 /home/inei/.local/lib/python3.6/site-packages/pysam/init.py
Python packages OK: True
Everything looks OK: True
Thanks in advance !!!
It doesnt seem like ariba will receive any future changes. I requested DB maintainers to fix it on their end. But it is still ongoing process.
Hi,
ariba was running into weird issue while running on vf database: [E::hts_idx_push] Unsorted positions on sequence # 1: 109 followed by 11 OSError: building of index for /scratch/shadow/tmpr7wt7j_c/ariba_virulencefinder/ariba_virulencefinder/read_store.gz failed
I figured that it was because read_store.gz is incorrectly sorted because one of the genes doesnt have cluster information. I changed read_store.py to sort correctly even with cluster information missing but then it failed in future step: _init_and_run_clusters reference_names=self.cluster_ids[cluster_name], KeyError: ''
Obviously, because cluster name was missing. :)
Then I started digging around and made this small test:
mkdir vftest cd vftest ariba getref virulencefinder out.virulencefinder ariba prepareref -f out.virulencefinder.fa -m out.virulencefinder.tsv ./test cd test cat 02.cdhit.clusters.tsv | awk '{$1="";print}' | tr " " "\n" | sort | uniq > cluster_file grep ">" 02.cdhit.all.fa | sed 's/>//g' | sort > all_file wc -l all_file wc -l cluster_file diff cluster_file all_file
Output of the last three lines:
So the issue is because one or more of those 5 genes (in my case stx2h_O102_STEC299_122_CP022279_122) can be found in my sequencing reads but they are not part of any cluster. Whenever read_store is made, they do not contain any cluster name which fails the script.