mikolmogorov / Ragout

Chromosome-level scaffolding using multiple references
Other
149 stars 27 forks source link

Hal to Maf manual conversion #59

Closed sivico26 closed 4 years ago

sivico26 commented 4 years ago

Hi @fenderglass!

I was using trying to use Ragout to do Scaffolding in some relatively distant plant genomes. I started aligning with SibeliaZ and using the resulting .maf as input. Apparently, the genomes are too distant to make an inference.

I wanted to be sure that is the case, so I also tried to align with cactus and use the .hal result as input. However, I have problems Installing hal in my server.

I notice that the cactus version installed through bioconda includes Hal as well (not sure if it is in the Cactus package itself or a dependency though). Nevertheless, Cactus is incompatible with the latest Ragout versions (I tried 2.3, 2.2, and 2.1). Ragout 2.0 seems to be compatible, but I wonder if it is not too old to use by now.

So all this led to wonder if I can take the Hal alignment and manually convert it to .maf with hal2mafMP.py and use this .maf as input. So my question is, is this feasible? Or there are some internal steps that do not allow us to do this? How would I do it correctly?

Specifically, these are my main concerns:

Sorry if I made this longer than necessary and thanks in advance. Sivico

mikolmogorov commented 4 years ago

Hi Sivico,

Can you specify what do you mean by "Cactus is incompatible with the latest Ragout versions"? HAL input is expected to work for Ragout and it is the recommended option.

Mikhail

sivico26 commented 4 years ago

If you try to install both Cactus(which includes Hal) and Ragout in the same conda environment, the newest that is installed is Ragout 2.0. Later Ragout packages are, apparently, incompatible with Cactus.

As I see it, the problem is not Hal but Cactus, as it has several conflicting dependencies with Ragout (e.g. networkx). I think Hal should be able to co-exist with Ragout in the same conda environment, but as far as I know, there is no conda package for Hal (other than the cactus package, that again includes it).

mikolmogorov commented 4 years ago

Hmm, I see. Look like cactus is relying on networkx package, thus can not live in the same environment with Ragout. I will look into this in the future. In the mean time, I think the fastest way could be building HAL from source and adding binaries to PATH (and PYTHONPATH) - https://github.com/ComparativeGenomicsToolkit/hal. Installation could be a bit tricky, but if you follow the installation guide carefully, it should work just fine. Let me know if you will have any issues.

Mikhail

sivico26 commented 4 years ago

Thank you, Mikhail.

We already tried building Hal from source, but we have problems in our server, which is quite old, to be honest. Although we can continue to try that, I would like to explore the option of transforming the .hal alignment into a .maf alignment and use that as input for Ragout, and that is why I open the issue in a first place.

I know it may not be recommended, but my question is, is it possible? What would I have to do? And how different would be the result of using the .hal format as input?

mikolmogorov commented 4 years ago

Ok, makes sense.

My advice on building HAL would be to try to build it inside bioconda environment. If there is a working HAL package on bioconda, then one should be able to build it too.

Wrt to MAF input - it is possible too. There is some discussion on this topic here - https://github.com/fenderglass/Ragout/issues/40. It should be possible to manually convert hal to maf, and then feed maf to Ragout in a separate environment.

Mikhail