xingjianleng / DBGA

The repository for the genome sequence alignment research project
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

Problem when running benchmark on poapy ans MUSCLE #4

Closed xingjianleng closed 2 years ago

xingjianleng commented 2 years ago

Data sample used is "MK211376" and "MK211377", approximately 30,000 long. (I intentionally picked two similar sequences, I also have other sets of sequences)

The implementation for partial order graph code is from https://github.com/ljdursi/poapy. This partial order graph implementation uses excessive memory. I have 13GB RAM for WSL2, but each time running it, the process was intentionally killed by Linux kernel.

I'm currently running this script on partch (server for CS students). This server has 32GB RAM but with worse CPU. It ran successsfully and produced the result. But I would have to reproduce all the benchmarking work with this platform.

In terms of MUSCLE, I checked the official documentation and it seems lots of command line flags were deleted (e.g. maxiters). Running MUSCLE directly will result in segmentation fault. It does not depend on the RAM of the computer/server. Tried on both 16GB and 32GB computer (both failed). By clipping input sequences to approximately 17,000 long. The software can run and produce output (both computers worked).

GavinHuttley commented 2 years ago

Nice to get some indications of performance limits. Suggest adding cogent3 pairwise to this set.

We can add you to an NCI account. There are 1TB RAM machines there, although they can be hard to get access to. The standard machines have 64GB RAM, clearly better than the choices you have.

biolinyu commented 2 years ago

@xingjianleng as for POA, you might take a look at https://github.com/rvaser/spoa

xingjianleng commented 2 years ago

@xingjianleng as for POA, you might take a look at https://github.com/rvaser/spoa

This partial order alignment looks great! It can output a DOT file. Will check its performance at weekends.