sequencing / isaac_aligner

Isaac Genome Alignment Software
Other
37 stars 8 forks source link

the use of isaac #22

Closed xiexiaowei closed 6 years ago

xiexiaowei commented 6 years ago

Hi, I'm a new learner about isaac. So, can you help me check whether I use this software correctly? 1) sort for fasta ../bin/isaac-sort-reference -g ../test.fasta -j 24 -o ../fasta_sort ../bin/isaac-pack-reference -r ../fasta_sort/sorted-reference.xml -j 24 ../bin/isaac-unpack-reference -i ../packed-reference.tar.gz

2) isaac-align ../bin/isaac-align -r ../sorted-reference.xml -b ../test_input/ --base-calls-format fastq -m 50 ...

Q1: I got a folder containing 'sorted.bam'. When I used samtools to see the align info in 'sorted.bam', it's different from what I saw before, no info about chromosome and so on, so is my result correct? @HD VN:1.0 SO:coordinate @PG ID:iSAAC PN:iSAAC CL:/data3/xxw/software/isaac/isaac_01140312/bin/isaac-align -r /data3/xxw/software/isaac/test/sorted-reference.xml -b /data3/xxw/software/isaac/test_input/ --base-calls-format fastq -m 50 --base-quality-cutoff 15 --keep-duplicates yes --variable-read-length yes --realign-gaps no --default-adapters AGATCGGAAGAGC,GCTCTTCCGATCT -o /data3/xxw/software/isaac/test_input/out_dir VN:iSAAC-01.14.03.12 @RG ID:0 PL:ILLUMINA SM:default PU:000000000-A8HE7:1:none @SQ SN:chrM LN:16571 UR:/data3/xxw/software/isaac/test/genome.fa M5:d2ed829b8a1628d16cbeee88e88e39eb @SQ SN:chr1 LN:483300 UR:/data3/xxw/software/isaac/test/genome.fa M5:0572953de0e8f244b7ef44f0b058d24c 000000000-A8HE7:1:1:0:0 4 0 0 * 0 0 ACGCTGTTGGCCTGGATCTGAGCCCTGGTGTTCTGCCATTGCTGCTGTGTGGAAGTTCACAGAGCCTCCACCACCCCGAGATCACATTT AAA?11DFF11@1FEGGC0FDFHFF0B1D11>AAACB1B@133BA33AB1BEHHAFFHHG>1>1>CF1?1ADGGGF0EEEGGHHGCEHF RG:Z:0 NM:i:0 BC:Z:none

Q2. Has the "sorted.bam" file been sorted by coordinate based on isaac? If so, there is no need for me to sort again by samtools or picard.

I'm looking forward to your reply! Thanks very much from my heart!

rpetrovski commented 6 years ago

Welcome to Isaac users community!

Q1: In this particular example it seems that Isaac failed to align the read. This is quite possible given that the blast query https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=5HJWE76Z014 indicates that the read would only align to chr1 if two 20+ base deletions are introduced.

A side note here is that your bam header indicates that you only have two chromosomes in your reference with chr1 truncated to 483300 bases. This is generally not the correct way to run the alignment (with any aligner) as it will fail to align reads belonging to missing chromosomes and will be overconfident about the reads it places on the chromosomes included.

Q2: yes, the sorted.bam has all reads sorted and ready to use for variant calling.

Please be aware that there is a much more recent version of Isaac aligner available at https://github.com/Illumina/Isaac4 Having said that, the read you are trying to align does not get placed by Isaac4 either.

Please let me know if you have other questions.

Roman.

xiexiaowei commented 6 years ago

Dear Roman, So nice of you!

I'm not so familiar with this software. Firstly I just wanted to test it so that I could see test-result quickly. This time, I tested the chrM and I thought I got the correct .sam file. @HD VN:1.0 SO:coordinate @PG ID:iSAAC PN:iSAAC CL:/data3/xxw/software/isaac/isaac_01140312/bin/isaac-align -r /data3/xxw/software/isaac/test_index/sorted-reference.xml -b /data3/xxw/software/isaac/test_input/ --base-calls-format fastq -m 50 -o /data3/xxw/software/isaac/test_input/out_dir VN:iSAAC-01.14.03.12 @RG ID:0 PL:ILLUMINA SM:default PU:000000000-A8HE7:1:none @SQ SN:chrM LN:16571 UR:/data3/xxw/software/isaac/test_index/genome.fa M5:d2ed829b8a1628d16cbeee88e88e39eb 000000000-A8HE7:1:1:0:0 0 chrM 1 60 49M 0 0 GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCA AAA?11DFF11@1FEGGC0FDFHFF0B1D11>AAACB1B@133BA33AB SM:i:248 RG:Z:0 NM:i:0 BC:Z:none 000000000-A8HE7:1:1:1:0 0 chrM 101 60 49M 0 0 GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCAT AAA?11DFF11@1FEGGC0FDFHFF0B1D11>AAACB1B@133BA33AB SM:i:248 RG:Z:0 NM:i:0 BC:Z:none 000000000-A8HE7:1:1:2:0 0 chrM 201 60 49M * 0 0 AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAA AAA?11DFF11@1FEGGC0FDFHFF0B1D11>AAACB1B@133BA33AB SM:i:248 RG:Z:0 NM:i:0 BC:Z:none

Q1: But I still have a question. When I tried to sort reference for the whole hg19 twice, the process would be killed and then failed. But if I sorted for a certain chromosome, it would succeeded. What's wrong with that?

Thanks very much again from my heart! Xiaowei,

xiexiaowei commented 6 years ago

The detailed error of sorting reference for hg19 is as follows: [sorted-reference.xml] 2018-01-15 11:06:20 [2b266f2df4c0] reserving memory for 5629358360 kmers [sorted-reference.xml] terminate called after throwing an instance of 'std::bad_alloc' [sorted-reference.xml] what(): std::bad_alloc [sorted-reference.xml] /bin/bash: line 2: 611 Aborted (core dumped) /data3/xxw/software/isaac/isaac_01140312/bin/../share/iSAAC-01.14.03.12/makefiles/reference/../../../../share/iSAAC-01.14.03.12/makefiles/common/../../../../libexec/iSAAC-01.14.03.12/findNeighbors -i Temp/sorted-reference.xml -t Temp/neighbors.dat --seed-length 32 --output-directory /data3/xxw/software/isaac/isaac_index/fasta_sort -o sorted-reference.xml.tmp make: *** [sorted-reference.xml] Error 134 make: Leaving directory `/data3/xxw/software/isaac/isaac_index/fasta_sort'

rpetrovski commented 6 years ago

This failure indicates that you don't have enough RAM on the box. With this genome you will need to have at least 50 gigabytes of RAM available to findNeighbors.

Again, I'm strongly encouraging you to consider https://github.com/Illumina/Isaac4. It is GPL licensed and is much more accurate and sensitive in terms of data processing. Besides, it does not require pre-sorting the reference :).

Roman.

xiexiaowei commented 6 years ago

Dear Roman,

It's me again! I followed your advice and successfully installed the Isaac4. I have successfully finished the alignment for a WGS sample guickly.

However, I met new problem as follows: 2018-01-20 22:46:49 [7f26cf773700] ERROR: Thread: 1 caught an exception first: 2018-Jan-20 22:46:49: Invalid argument: /public1/users/xxw/software/isaa4/Isaac4-Isaac-04.17.06.15/src/c++/include/io/InflateGzipDecompressor.hh(283): Throw in function static std::streamsize isaac::io::InflateGzipDecompressor::processPendingBytes(z_stream&, std::streamsize, ContainterT&, bool) [with ContainterT = std::vector; std::streamsize = long int; z_stream = z_stream_s] Dynamic exception type: boost::exception_detail::clone_impl std::exception::what: incorrect data checkz_stream( next_in:0x26022ec0 avail_in:21264 total_in:1223920 next_out:0x7f1ef81bf924 avail_out:472958700 total_out:4575982 msg:incorrect data check state:0x2600dd5e zalloc:1 zfree:1 opaque:0x25fedcd0 data_type:96 adler:1739564425 reserved:0) [isaac::io::tag_errmsg*] = While reading from /public1/users/xxw/PPdigenome/EMX1/lane1_read2.fastq.gz : incorrect data checkz_stream( next_in:0x26022ec0 avail_in:21264 total_in:1223920 next_out:0x7f1ef81bf924 avail_out:472958700 total_out:4575982 msg:incorrect data check state:0x2600dd5e zalloc:1 zfree:1 opaque:0x25fedcd0 data_type:96 adler:1739564425 reserved:0) 2018-01-20 22:46:51 [7f1d758f4700] WARNING: rethrowing a thread exception terminate called after throwing an instance of 'boost::exception_detail::clone_impl' what(): incorrect data checkz_stream( next_in:0x26022ec0 avail_in:21264 total_in:1223920 next_out:0x7f1ef81bf924 avail_out:472958700 total_out:4575982 msg:incorrect data check state:0x2600dd5e zalloc:1 zfree:1 opaque:0x25fedcd0 data_type:96 adler:1739564425 reserved:0) Aborted (core dumped)

Thanks again from my heart! xiaowei,

rpetrovski commented 6 years ago

There seems to be a problem with the contents of /public1/users/xxw/PPdigenome/EMX1/lane1_read2.fastq.gz. Are you able to unpack the same file with regular gzip? Since you are reporting this against Isaac4, I am going to close the ticker here. If the regular gzip works, please open a new ticket in Isaac4 along with location of the data file causing the failure.

Cheers. Roman.