mikolmogorov / Ragout

Chromosome-level scaffolding using multiple references
Other
149 stars 27 forks source link

Error reading permutations #15

Closed xingwu2 closed 7 years ago

xingwu2 commented 7 years ago

Hi, I am using ragout2 to improve a soybean (1GB) assembly. After having the .hal file fed into the ragout program, an error popped out. Here is the log file.

[20:34:51] INFO: Starting Ragout v2.0 [20:34:51] INFO: Synteny block scale set to 'large' [20:34:51] INFO: Extracting FASTA from HAL [20:35:04] INFO: Converting HAL to MAF [20:35:04] DEBUG: hal2mafMP.py DSalW82.hal batch_hal_1/hal-workdir/alignment.maf --noAncestors --numProc 24 --refGenome DS --targetGenomes DS,W82 [20:45:26] INFO: Extracting synteny blocks from MAF [20:45:26] INFO: Running maf2synteny module [20:45:26] DEBUG: Parsing MAF file [20:45:30] DEBUG: Started initial compression [20:45:30] DEBUG: Simplification with 30 500 [20:45:30] DEBUG: Simplification with 100 5000 [20:45:30] DEBUG: Simplification with 500 50000 [20:45:30] DEBUG: Simplification with 5000 500000 [20:45:30] INFO: Inferring phylogeny from synteny blocks data [20:45:30] DEBUG: Reading permutation file [20:45:30] ERROR: An error occured while running Ragout: [20:45:30] ERROR: Error reading permutations

When I check the permutation files, they are all empty. Do you know what could possibly cause this error?

Best

Xing Wu

mikolmogorov commented 7 years ago

Hi, Thank your for the feedback!

This looks strange, it might be that the file with synteny block permutations is either empty or does not exist. Can you check if the files "hal-workdir/100/genomes_permutations.txt" and "hal-workdir/10000/genomes_permutations.txt" exist / non-empty? If you are able to share those files, it would be helpful as well.

Best, Mikhail

xingwu2 commented 7 years ago

The files exist but they are all empty.

mikolmogorov commented 7 years ago

Ok, then something went wrong with the alignment / synteny algorithm. What is the expected sequence similarity between the genomes? Could you also check that "hal-workdir/alignment.maf", its size should be 2GB+. Could you also send me, say, first 1000 lines of this file?

One thing I can think about is that genome names used for cactus / ragout input are different (they are case sensitive) - could you check this as well?

xingwu2 commented 7 years ago

Thank you for the quick response.

The alignment.maf file is just 998M, so it is very likely that some went wrong. Here is the first 1000 lines of the file (1000_alignment.txt). I didn't use Cactus to do the alignment because I can't install in a Torque cluster. Instead, I used LAST( http://last.cbrc.jp/ ) to perform the alignment and output the .maf file and used HAL (https://github.com/ComparativeGenomicsToolkit/hal) to convert into .hal file.

To be clear, I will also attach the first 1000 lines of the .maf file (1000_last.txt) from LAST.

1000_alignment.txt 1000_last.txt

Thank you so much

mikolmogorov commented 7 years ago

Thank you,

First of all, I do not recommend to use other tools, because cactus alignment has some special properties that we use. In particular, the output alignment consist of non-overlapping columns, which is not the case for most other aligners to the best of my knowledge.

Still, if you want you can try to proceed with LAST (but if you will see a lot of warnings about overlapping blocks, most likely the results will not be accurate). It looks like there is some problem with LAST -> HAL conversion because I only see DS genome in .maf file, but not W82. Make sure that the both genomes are represented in HAL (with halStats utility, for example) with proper names.

xingwu2 commented 7 years ago

Thank you so much.

Now, I plan to assembly each chromosome individually using Sibelia rather than HAL. I will first decide which DS scaffolds go to which W82 chromosome, and use ragout + Sibelia to assembly the chromosome. Do you recommend me to do this?

Best

mikolmogorov commented 7 years ago

Yes, this should work if you don't anticipate any large intra-chromosomal rearrangements.

However, it is unclear what to do with repetitive contigs. If you consider trying to install/run progressive cactus, we will be happy to help.

kspham commented 7 years ago

I'm not sure if Sibelia can run for soybean or not, but please give it a try and let us know. Thanks

On Tue, Dec 20, 2016 at 1:28 PM, xingwu2 notifications@github.com wrote:

Thank you so much.

Now, I plan to assembly each chromosome individually using Sibelia rather than HAL. I will first decide which DS scaffolds go to which W82 chromosome, and use ragout + Sibelia to assembly the chromosome. Do you recommend me to do this?

Best

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fenderglass/Ragout/issues/15#issuecomment-268364577, or mute the thread https://github.com/notifications/unsubscribe-auth/AA83CWIAmFYYqjImGPfzhP8y5IsRJ52Dks5rKEiLgaJpZM4LRdP3 .

xingwu2 commented 7 years ago

Hello,

I used Sibelia + ragout2 to reconstruct one of the soybean chromosome (Chr18) in my special soybean genotype (DS), and it worked very well. Basically, the program returned one large pesudomolecule which is about the same size as Chr18 in the soybean reference genome (W82). I wonder if there is a way to verify the accuracy of the assembly? What do you mean by the "true" reference for the verify-order.py? A DS genome assembly?

Thanks!