Closed raimiredwan closed 9 years ago
Hi,
Thanks for reporting, and sorry for my late reply. I just pushed an attempt to fix this bug. Please let me know if it solves your problem, otherwise I'll have a look at it again.
Hi, I have two issues:
First:
Traceback (most recent call last):
File "../../softwares/BESST_RNA/src/Main.py", line 247, in
I just coment the the print lines, but after I had this error:
Traceback (most recent call last):
File "../../softwares/BESST_RNA/src/Main.py", line 247, in
The command line is: python ../../softwares/BESST_RNA/src/Main.py 1 -c ../montagem/scaffolds.fasta -f ../alinhamentos/alinhamentos.sorted.bam -o scaffold -e 3 -T 20000 -k 500 -d 1 -z 1000
I ran the alignments with bwa:
bwa mem -t 20 ../index/scaffolds.fasta reads1.fastq reads2.fastq > alinhamentos.bam
I saw that the above guy had the same issue. Do you know what could be happening?
Hi,
Thanks for your report! I just pushed code that should fix the first bug (and eventually the second one as well). Please download the latest version here on git. Let me know if it solves your problem, otherwise I'll have a look at it again.
Best, Kristoffer
By the way, as you now have changed BESST_RNA locally (the commented lines), you might want to overwrite these changes when you pull the new version, see http://stackoverflow.com/a/8888015.
Or redo all changes to exactly the original state (including whitspace) before pulling.
Hi,
Thank you for your quick reply. I ran with the latest version now and a get this error now:
python ../../softwares/BESST_RNA/src/Main.py 1 -c ../montagem/scaffolds.fasta -f ../alinhamentos/alinhamentos.sorted.bam -o scaffold -e 3 -T 20000 -k 500 -d 1 -z 1000 ../../softwares/BESST_RNA/src/Main.py:224: UserWarning: parameter -g (treating haplotypic regions) inactivated, parameters -a and -b will not have any effect if specified. warnings.warn('parameter -g (treating haplotypic regions) inactivated, parameters -a and -b will not have any effect if specified. ') Starting scaffolding with library: ../alinhamentos/alinhamentos.sorted.bam Parsing BAM file... Computing parameters not set by user...
Mean of library set to: No mean calc since RNA reads Standard deviation of library set to: No std calc since RNA reads -T (library insert size threshold) set to: 20000 -k set to (Scaffolding with contigs larger than): 500 Number of links required to create an edge: 3 Read length set to: 99.5488375037 Relative weight of dominating link set to (default=3): 3
LG50: 9147 NG50: 1825 Initial contig assembly length: 104738606
Nr of contigs/scaffolds included in scaffolding: 25401
Total time elapsed: 0.384147882462
USEFUL READS (reads mapping to different contigs): 380691
Reads with too large insert size from "USEFUL READS" (filtered out): 91757
Number of duplicated reads indicated and removed: 30835
Mean coverage before filtering out extreme observations = 195.757730545
Std dev of coverage before filtering out extreme observations= 477.337760937
Quantile for repeat detector chosen to: 3.88445706788
Quantile for repeat detector chosen to: 3.88250556384
Quantile for repeat detector chosen to: 3.8783028409
Quantile for repeat detector chosen to: 3.87554197881
Quantile for repeat detector chosen to: 3.87274763561
Quantile for repeat detector chosen to: 3.87146619564
Quantile for repeat detector chosen to: 3.8706938998
Mean coverage after filtering = 123.658055592
Std coverage after filtering = 118.930367846
Length of longest contig in calc of coverage: 215056
Length of shortest contig in calc of coverage: 15693
Perform inference on scaffold graph...
Remove isolated nodes.
Remove edges from node if more than two edges
Remove isolated nodes.
Nr of new scaffolds created: 188
Writing out scaffolding results for step 1 ...
Traceback (most recent call last):
File "../../softwares/BESST_RNA/src/Main.py", line 247, in
KeyError: 'NODE_10231_length_1695_cov_59.1884_ID_20461'
Do you have any ideia? Could it be a problem with the alignments made with bwa?
Best, Osvaldo
Have you made sure that the contig names in the fasta file matches the contig names in the bam file? It is the most common reason for this happening. More specifically, the error is often showing when there is one or more contigs present in the bam file references, that is not seen in the contig/scaffold fasta file. In the error log you sent me you have the name of the particular scaffold. So I would start by seraching for the scaffold "NODE_10231_length_1695_cov_59.1884_ID_20461" and make sure it is present in the fasta input file.
Let me know if gets sorted out. Best, Kristoffer
Hi,
This contig is present in my input fasta file:
NODE_10231_length_1695_cov_59.1884_ID_20461 ATAATATTCCCACTATAAGTAGTTAAAGAAAGATACTTACTACTAACTTAGCTAATCTAT ATATCTATAGTCTAAGGTGAGAAACTTCTATTCTAACCTTCAGTCTCGCTAGTTCTCTAT TTCTCTATATTAAAGTCTCTCTAATAGTAAGAAGACTACTAAAACTATAATTAGTAGTTA...
Could be some problem in the id format of the contig?
ok. I have committed a new version! Please have a go again :)
Hi,
Thanks, now it worked.
But I didn't see a significant improvement in the assembly. This is my initial assembly with Spades:
FILE scaffolds.fasta (contigs >= 0 bp)
Total length: 104738606 Total length w/o "N"s: 104733780 Mean cluster size: 1449.24805246918 N50: 9147 (1825 contigs)
and this is the final assembly after running BESST:
FILE Scaffolds-pass1.fa (contigs >= 0 bp)
Total length: 104761006 Total length w/o "N"s: 104756180 Mean cluster size: 1454.06479103918 N50: 9361 (1715 contigs)
The initial DNA dataset is Illumina single-end 100bp and the RNA-Seq are Illumina paired-end 100pb. As a mentioned before I use bwa-mem with default parameters to align the RNA-Seq reads in the assembly. Have you seen better results with other aligner tools?
Hi, great that it worked!
My guess is that Scaffolding with RNA-seq does not improve (genome) assembly stats a lot - simply because it only scaffolds the gene space which is usually not a big fraction of the genome. The genes covered in single scaffolds might however be improved, do you have any method for evaluating this, like mapping core genes (CEGMA)?
To significantly improve the genome wide contiguity, I would say you need a (genome) mate pair library.
Best, Kristoffer
Hi,
It makes sense. I'll take a look at CEGMA. Thank you for your help.
Best, Osvaldo
Hi,
I have a very limited scripting knowledge, while I was running the BESST_RNA tool, it throws me the error below:
Traceback (most recent call last): File "/export/home/nenas/Desktop/program/BESST_RNA/src/Main.py", line 247, in
options.mapquality)
File "/export/home/nenas/Desktop/program/BESST_RNA/src/Main.py", line 87, in Main
(Contigs, Scaffolds, F, param) = MS.Algorithm(G, Contigs, Scaffolds, F, Information, C_dict, param) # Make scaffolds, store the complex areas (consisting of contig/scaffold) in F, store the created scaffolds in Scaffolds, update Contigs
File "/export/home/nenas/Desktop/program/BESST_RNA/src/MakeScaffolds.py", line 44, in Algorithm
G, Contigs, Scaffolds = RemoveLoops(G, Scaffolds, Contigs, Information, F) #step4
File "/export/home/nenas/Desktop/program/BESST_RNA/src/MakeScaffolds.py", line 161, in RemoveLoops
for graph in graphs:
File "/usr/local/lib/python2.7/dist-packages/networkx/algorithms/components/connected.py", line 92, in connected_component_subgraphs
for c in connected_components(G):
File "/usr/local/lib/python2.7/dist-packages/networkx/algorithms/components/connected.py", line 54, in connected_components
for v in G:
RuntimeError: dictionary changed size during iteration
I run it using this command line: python /export/home/nenas/Desktop/program/BESST_RNA/src/Main.py 2 -c Scaffolds_pass3.fa -f accepted_hits_1_sort.bam accepted_hits_2_sort.bam -o Besst_RNA -e 3 3 -T 50000 50000 -k 1000 1000 -z 1000 1000 >Besst_RNA.out 2>Besst_RNA.err
I actually map my RNA filtered reads using tophat and bowtie2, which I bet will have lots of multimap, due to alternative junctions, which is been taken into account in tophat. Is that the reason for the error.
What is the RNASeq mapper would you suggest?
Any suggestion how to go about this?
Thank you