sinamomken / intemap-installer

A collection of scripts and packages to install InteMap metagenome assembler.
GNU General Public License v3.0
0 stars 0 forks source link

Increase variable 'max_qual' in Read.H #1

Open mcintyrejosh opened 7 years ago

mcintyrejosh commented 7 years ago

I have gotten the example data to run with InteMAP just fine. However now I am trying to run two of my own files before scaling up to my full data set. I have metagenomics samples from an Illumina hiseq 2000 that I have trimmed with fastxtoolkit. however when I try to run InteMAP with just two files it loads all of the parafiles gets through Jellyfish count and Jellyfish dump and it appears to get through correct as well. Then just after displayying the AT% and number of trusted Kmers it says

Quality value 42949672667 larger than maximum allowed quality value 60. increase variable 'mas_qual' in Read.H.

As far as I can figure out Read.H is called by Quake however there are no options to change this value with the QuakeParaFile. additionally The size of this number suggests that I am doing something very wrong when I am inputting the code but I have been unable to fix the issue after being at it for a couple of days. Any help would be greatly appreciated.

I have post more of the command line below for context

scrptdir: /apps/software/InteMAP/1.0 workdir: /home/mcintyj1/ILLUMINA/Metagenome_CRC_test @IntegMAP: OriReadFile: metagenome-files ['trimmed.P2E2-2_S1_L001_R1_001.fastq', 'trimmed.P2E2-2_S1_L001_R2_001.fastq'] @IntegMAP: LibraryInfoFile CRC_libraryinfofile -libraryname metagenome_CRC -insertsize 250 75 -type illumina off

@IntegMAP: Parameters: @IntegMAP: minHighCovLength 0 @IntegMAP: cabogspecfile /home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/cabogspecfile @IntegMAP: f-Filter-idba-low /home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/op-Filter-idba-low @IntegMAP: abyssparafile /home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/abyssparafile @IntegMAP: idbahighparafile /home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/idbaparafile @IntegMAP: idbalowparafile /home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/idbaparafile @IntegMAP: f-Filter-idba-high /home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/op-Filter-idba-high @IntegMAP: quakeparafile /home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/QuakeParaFile @IntegMAP: bowtie2parafile /home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/bowtie2parafile @IntegMAP: f-Filter-abyss /home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/op-Filter-abyss @IntegMAP: output assembled_metagenome @IntegMAP: clearance 0 @IntegMAP: python /apps/software/InteMAP/1.0/runerrcor.py metagenome-files >/home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/QuakeParaFile run quake python /apps/software/InteMAP/1.0/runquake.py >/home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/QuakeParaFile trimmed.P2E2->2_S1_L001_R1_001.fastq trimmed.P2E2-2_S1_L001_R2_001.fastq /home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/QuakeParaFile jellyfish count -c 8 -o Read.jf -m 17 -t 8 -s 2G trimmed.P2E2-2_S1_L001_R1_001.fastq trimmed.P2E2-2_S1_L001_R2_001.fastq jellyfish dump -c -o Read-files.qcts Read.jf correct -c 1 -u --log -f Read-files -m Read-files.qcts -k 17 -q 64 -p 8 282425314 trusted kmers AT% = 0.495885 trimmed.P2E2-2_S1_L001_R1_001.fastq Quality value 4294967267larger than maximum allowed quality value 60. Increase the variable 'max_qual' in Read.h. rm -f Read-files.qcts Read.jf error no trimmed.P2E2-2_S1_L001_R1_001.cor.fastq @IntegMAP: python /apps/software/InteMAP/1.0/runidba.py >/home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/idbaparafile tot_read_file IDBA_UD_low.d idba_ud --mink 64 --pre_correction not enough parameters IDBA-UD - Iterative de Bruijn Graph Assembler for sequencing data with highly uneven depth. Usage: idba_ud -r read.fa -o output_dir Allowed Options: -o, --out arg (=out) output directory -r, --read arg fasta read file (<=128) --read_level_2 arg paired-end reads fasta for second level scaffolds --read_level_3 arg paired-end reads fasta for third level scaffolds --read_level_4 arg paired-end reads fasta for fourth level scaffolds --read_level_5 arg paired-end reads fasta for fifth level scaffolds -l, --long_read arg fasta long read file (>128) --mink arg (=20) minimum k value (<=124) --maxk arg (=100) maximum k value (<=124) --step arg (=20) increment of k-mer of each iteration --inner_mink arg (=10) inner minimum k value --inner_step arg (=5) inner increment of k-mer --prefix arg (=3) prefix length used to build sub k-mer table --min_count arg (=2) minimum multiplicity for filtering k-mer when building the graph --min_support arg (=1) minimum supoort in each iteration --num_threads arg (=0) number of threads --seed_kmer arg (=30) seed kmer size for alignment --min_contig arg (=200) minimum size of contig --similar arg (=0.95) similarity for alignment --max_mismatch arg (=3) max mismatch of error correction --min_pairs arg (=3) minimum number of pairs --no_bubble do not merge bubble --no_local do not use local assembly --no_coverage do not iterate on coverage --no_correct do not do correction --pre_correction perform pre-correction before assembly

Traceback (most recent call last): File "/apps/software/InteMAP/1.0/runidba.py", line 103, in os.rename( 'out/contig.fa', 'idba.ctg.fa' ) OSError: [Errno 2] No such file or directory @IntegMAP: bash /apps/software/InteMAP/1.0/gomapandfilter.sh idba >/home/mcintyj1/ILLUMINA/Metagenome_CRC_test/tot_read_file >/home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/bowtie2parafile 1 >/home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/op-Filter-idba-low /apps/software/InteMAP/1.0/gomapandfilter.sh: line 37: /apps/software/bowtie2/2.2.4/bowtie2-build: >Permission denied /apps/software/InteMAP/1.0/FilterCtgCov -c idba.ctg.fa -b ./*.sam -o idba -t 1 -u 50 -r 1 -L 50 -C 0 -d 0 could not open file idba.ctg.fa @IntegMAP: python /apps/software/InteMAP/1.0/runcabog.py >/home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/cabogspecfile tot_read_file CRC_libraryinfofile /home/mcintyj1/ILLUMINA/Metagenome_CRC_test/Para/cabogspecfile error no find /home/mcintyj1/ILLUMINA/Metagenome_CRC_test/IDBA_UD_low.d/idba.sread Traceback (most recent call last): File "/apps/software/InteMAP/1.0/runInteMAP.py", line 349, in os.chdir( workdir+os.path.sep+cabogdir) OSError: [Errno 2] No such file or directory: >'/home/mcintyj1/ILLUMINA/Metagenome_CRC_test/CABOG.d'

sinamomken commented 7 years ago

Hello

You are correct and it seems that the problem lies somewhere within Quake parameters, as "correct" program run in line correct -c 1 -u --log -f Read-files -m Read-files.qcts -k 17 -q 64 -p 8 is a binary part of quake software.

I don't know the exact answer to your problem, but I try to help you debug that. Unfortunately InteMap is a very complicated assembly software created by combining many other linux-based assemblers and any error in any one of them gets propagated up to you.

As first I suggest you to try as many successful assemblies as possible, using the example data in the paper. There are more than 1 example used as benchmark in the original paper; By reassembling many of the more complicated examples you will have more succeeded configurations to compare with. One of the big examples is "sim113" consisted of simulated metagenome of 113 bacteria.

After successfully assembling more example data, it may be good to play with parameters in the file "Para/QuakeParaFile". I am not sure what parameters should be changed, but try playing with "quality-start" and "hash_size". If your input files are very big it may be useful to increase "hash_size" as much as possible. For multi-gigabyte fastq input files, it consumes tens of gigabytes of disk storage for intermediate files and uses as much as 15GB ram when running! So be sure your device has enough amount of RAM and free disk storage. Running InteMap on sim113 example data is good for showing how heavy its process can be (it took days for me to be completed)!

Simultaneously you should email the authors of InteMap paper too. As you know I haven't written this software and even running it as test was hard for me, such that I had to write a script just for its installation. So the authors will probably help you more about that exact error.

P.S. I wrote the installation scripts so that all binary files of used programs are placed in "/opt/bioinformatics/" to make them separate from other linux software, and then added that location to the PATH environment variable so that they can be accessed globally.

mcintyrejosh commented 7 years ago

Hey Thanks sinamomken I only recently saw your comment and i have been trying to run the sim113 dataset but I'm getting the same error so i think there is some issue with how the program has been installed but thank you