Open zjassaf opened 9 years ago
Similar in here, after running
gkno tangram-bam --in bams/93-968.bam --mobile-element-fasta repeats/test_me.fa --out 93-968.tangram.bam --region Chr19
i get the segmentation fault error
sh-4.2$ /home/shared/app/gkno_launcher/tools/Tangram/bin/tangram_scan -in /home/ipedroso/ANALYSES/MEI/Populus/file_list.text -dir tangram_out
Violación de segmento
from the bam file header
@PG ID:bwa PN:bwa VN:0.5.9-r16
@PG ID:tangram_bam CL:/home/shared/app/gkno_launcher/tools/Tangram/bin/tangram_bam --ref repeats/test_me.fa --input bams/93-968.bam --target-ref-name Chr19 --output /home/ipedroso/ANALYSES/MEI/Populus/93-968_ZA.bam
I have not tried re-aligning this data using MOSAIK.
I have also observed seg faults running on bwa data and am not sure what the cause of the problem is. If you don't have massive amounts of data, I would recommend aligning with Mosaik since this is what Tangram was designed to work with. If you need any assistance, please let me know ( AlistairNWard@gmail.com) and I can help getting Mosaik alignments and tangram run. In particular, we have a pipeline system (gkno) that helps running larger pipelines and also makes it possible to build your own pipelines for running repeated / similar analyses.
On Wed, Sep 16, 2015 at 1:36 PM, Inti Pedroso notifications@github.com wrote:
Similar in here, after running
gkno tangram-bam --in bams/93-968.bam --mobile-element-fasta repeats/test_me.fa --out 93-968.tangram.bam --region Chr19
i get the segmentation fault error
sh-4.2$ /home/shared/app/gkno_launcher/tools/Tangram/bin/tangram_scan -in /home/ipedroso/ANALYSES/MEI/Populus/file_list.text -dir tangram_out Violación de segmento
from the bam file header
@PG ID:bwa PN:bwa VN:0.5.9-r16 @PG ID:tangram_bam CL:/home/shared/app/gkno_launcher/tools/Tangram/bin/tangram_bam --ref repeats/test_me.fa --input bams/93-968.bam --target-ref-name Chr19 --output /home/ipedroso/ANALYSES/MEI/Populus/93-968_ZA.bam
I have not tried re-aligning this data using MOSAIK.
— Reply to this email directly or view it on GitHub https://github.com/jiantao/Tangram/issues/5#issuecomment-140862354.
Hi,
I would like to use tamgram to identify the location of transposable elements in Drosophila, however when I run tangram_scan I get a segmentation fault. I suspect that tangram_bam is not working, as it looks like the ZA headers are empty (I think?). However, I know that my strains should be heterozygous for a number of different transposable elements, and in fact there are already estimates of the locations. I'd rather not run Mosaik, so if there is a way to get tangram_bam to work that would be nice.
I've put below info about what I'm doing. Thanks! Zoe
As a positive control, I know, for example, that there should be at least 78 copies of INE_1 heterozygous in my strain, which I know from previous work. E.g., I have this data: te presence ch Upstream_estimate Downstream_estimate INE-1:TIR:DNA yes 2R 2496555 2497124 INE-1:TIR:DNA yes 4 286454 287918 INE-1:TIR:DNA yes 3L 17862241 17862700
I can get a copy of INE_1 sequence from flybase (transposon_sequence_set.embl.txt), so I make my moblist file, which contains only:
I generate my bam file with bwa, with the option -a to keep reads which only have 1 of the pair map to the genome (since this appears necessary for tangram?). These are the command line options I use: bwa mem -M -a -R
Then I remove duplicates and sort and index using PIcardTools. I also merge several bams together, because I have a single sample which was used to generate several libraries. Then with that merged bam I run tangram_bam: mySoftwarePath/Tangram/bin/tangram_bam -i myDataPath/MA_6.merged.dedup.bam -r myDataPath/moblist_ine_only.fasta -o myDataPath/MA_6.merged.dedup.tangram.bam
And sort the resulting stuff mySoftwarePath/java -Xmx2g -jar mySoftwarePath/picard-tools-1.105/SortSam.jar INPUT=myDataPath/MA_6.merged.dedup.tangram.bam OUTPUT=myDataPath/MA_6.merged.dedup.sorted.tangram.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=TRUE
Now generate my file list tangramBamList.txt, which contains only: myDataPath/MA_6.merged.dedup.sorted.tangram.bam
Now do tangram_scan: mySoftwarePath/Tangram/bin/tangram_scan -in myDataPath/tangramBamList.txt -dir myDataPath/tangramOut
And I get the error: Segmentation fault (core dumped)
This is what a sample of what my bam file looks like: D4LHBFN1:293:C3L3LACXX:2:2213:20193:18303 107 YHet 1 60 16S48M2S = 1 38 CTACGGTTGTCTCAGCAGGGTCACGTAATGCTGATCCAGTCTTGTTTTTATTTTCATTCATGTTGT BHGHIIIIG@HGG GDGIIGI:BDFHDFEGGG<FGHGIIIBHHFHCDHIIGHIFEHFHFFEDE?CCE PG:Z:MarkDuplicates RG:Z:140307_PINKERTON_0293_BC3L3LACXX_L2 NM:i:0 AS:i:48 XS:i:0 ZA:Z:<@;60;;;1;;><&;60;;;1;;> D4LHBFN1:293:C3L3LACXX:2:2213:20193:18303 151 YHet 1 60 28S38M = 1 -38 ATATGGTGTTTCCTACGGTTGTCTCCGCAGGGTCACGTAATGCTGATCCAGTCTTGTTTTTATTTT CDCDDDCADDDBDDDDFFHEH HB;-'GHGGDHDB2HBIGGHCGCEGIJJJJJJIJIJJJJJIHEBA PG:Z:MarkDuplicates RG:Z:140307_PINKERTON_0293_BC3L3LACXX_L2 NM:i:0 AS:i:38 XS:i:0 ZA:Z:&;60;;;1;;><@;60;;;1;; D4LHBFN1:293:C3L3LACXX:2:2313:15215:43919 147 YHet 10 60 83M = 21 -72 TAATGCTGATCCAGTCTTGTTTTTATTTTCATTCATGTTGTTGCTCTTGCTTTGATTCCGACTTCTAACGTTTAACCTGTGAT DDDDD DDDDDDCCDDEDDDFFFFFFGHHHHHJJJJJJJJJJJIJJJJIJHJIIIIJJJJHHJIJIIJJJJJJJHHJJIJJJII PG:Z:MarkDuplicates.3 RG:Z:140307_PINKERTON_0293_BC3L3LACXX_L2.3 NM:i:3 AS:i:68 XS:i:20 ZA:Z:&;60;;;1;;><@;60;;;1;; D4LHBFN1:293:C3L3LACXX:2:2314:3464:7166 99 YHet 17 60 82M = 56 122 GATCCAGTCTTGTTTTTATTTTCATTCATGTTGTTGCTCTTGCTTTGATTCCGACTTCTAACGTTTAACCTGTGATCAGACG AEDHGGIEFHHHH HIICAGGIIFE>DFDHHHGEHHIIIG@FGGGGIIIIIG@HIHIIIGHFHGEFFFF@@EECEEA;>CCCC PG:Z:MarkDuplicates.1 RG:Z:140307_PINKERTON_0293_BC3L3LACXX_L2.1 NM:i:3 AS:i:67 XS:i:20 ZA:Z:<@;60;;;1;;><&;60;;;1;;> D4LHBFN1:293:C3L3LACXX:2:2313:15215:43919 99 YHet 21 60 81M = 10 72 CAGTCTTGTTTTTATTTTCATTCATGTTGTTGCTCTTGCTTTGATTCCGACTTCTAACGTTTAACCTGTGATCAGACGTTT JIJHH