mikolmogorov / Ragout

Chromosome-level scaffolding using multiple references
Other
146 stars 27 forks source link

ERROR: No sequences read for genome X (from HAL file). #42

Closed aprezvykh closed 5 years ago

aprezvykh commented 5 years ago

Hello! I am trying to run ragout with large genome of Drosophila Virilis (~160 Gb), so i use HAL file, produced with ProgressiveCactus software

ProgressiveCactus run command: ~/sofware/progressiveCactus/bin/./runProgressiveCactus.sh hal.cfg hal/ alignment.hal --maxCpus=64 --maxThreads=64

ProgressiveCactus config file: dana /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/dana-all-chromosome-r1.05.fasta dere /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/dere-all-chromosome-r1.05.fasta dgri /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/dgri-all-chromosome-r1.05.fasta dmel /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/dmel-all-chromosome-r6.19.fasta dmoj /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/dmoj-all-chromosome-r1.04.fasta dper /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/dper-all-chromosome-r1.3.fasta dpse /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/dpse-all-chromosome-r3.04.fasta dsec /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/dsec-all-chromosome-r1.3.fasta dsim /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/dsim-all-chromosome-r2.02.fasta dvir /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/dvir-all-chromosome-r1.06.fasta dwil /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/dwil-all-chromosome-r1.05.fasta dyak /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/dyak-all-chromosome-r1.05.fasta dvir_9 /mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/masurca.allreads.fasta

HAL file was produced with no errors, so i run Ragout: ~/sofware/Ragout/ragout.py -t 64 -s hal ass.rcp

Ragout recipe file: .references=dana,dere,dgri,dmel,dmoj,dper,dpse,dsec,dsim,dvir,dyak,dwil .target=dvir_9 .blocks=small .hal=/mnt/raid/illumina/AlexR/genomes/assemblies.nov.2018/9/report/ragout/alignment.hal

So i got the following log: [15:37:03] INFO: Starting Ragout v2.0 [15:37:03] INFO: Synteny block scale set to 'small' [15:37:03] INFO: Extracting FASTA from HAL [15:37:06] INFO: Converting HAL to MAF [15:37:57] INFO: Extracting synteny blocks from MAF [15:37:57] INFO: Running maf2synteny module [15:38:05] INFO: Inferring phylogeny from synteny blocks data [15:38:05] ERROR: An error occured while running Ragout: [15:38:05] ERROR: No sequences read for genome dwil. Check recipe for correctness.

What can produce this error, and how can i solve it?

mikolmogorov commented 5 years ago

Hi,

Seems that "dwil" genome you specified in the recipe was not found in HAL. Could be for two reasons: (i) its name in HAL is different or (ii) it wasn't aligned at all / alignment is very fragmented. You can check the genome names in HAL with halStats utility from HAL tools.

If the name is correct, then the genome is likely too fragmented / distant to produce meaningful alignment. In this case, just remove it from the comparison (just from the recipe, no need to recompute the alignment). In general, I recommend to use 3-4 closest references at once (12 is likely too many).

aprezvykh commented 5 years ago

Thanks, i will try! P.S. Приятно увидеть отечественный софт, при этом еще софт, который реально искал.

aprezvykh commented 5 years ago

Good morning. I've tried to run Ragout with your fixes. I reduced number of ref.genomes to three. But i've got the same error as previous: ERROR: No sequences read for genome dwil

halStats output: Anc0, 4, 12602538, 5956, 0, 510556 dvir_9, 0, 178684636, 762, 2304654, 0 dmoj, 0, 193826310, 6841, 6841, 0 dvir, 0, 206026697, 13530, 13530, 0 dgri, 0, 200467819, 17440, 17440, 0

All seems to be fine...Theese genomes are not far from each other, alignment should not be fragmented.

Thank you!

mikolmogorov commented 5 years ago

Hi,

Looks like the dwill genome is not in the HAL alignment -- it should contain all genomes (references and target) that you specified in the recipe file.

Best, Mikhail

mikolmogorov commented 5 years ago

Closing due to inactivity, feel free to reopen if you have more questions.