sanger-tol / treeval

Pipelines for the production of Treeval data
https://pipelines.tol.sanger.ac.uk/treeval
Other
22 stars 4 forks source link

REQUEST: Large genome parallelisation #83

Open DLBPointon opened 1 year ago

DLBPointon commented 1 year ago

Description of feature

Currently testing TreeVal by running daSamNigr - the European elder.

The peptide subworkflow, is obviously slower however does not require any futher optimizations. GAP_FINDER also requires no further optimizations - output is exactly as expected.

Repeat Density is much slower.

Insilico Digest completed, as well as GENERATE_GENOME.

Nothing is moving through the nuc_alignments subworkflow, currently, it is in a constant state of fail and retry.

We will need to include a fix that will slip the genome into 1Gbp chunks, run the workflow in parallel and then merge. For the current pipeline, this could be simple. The more complex subworkflows however may require a different solution.

DLBPointon commented 1 year ago

NOTE: The Eldar is a 12Gbp genome.

DLBPointon commented 9 months ago

Linked to issues:

199

203