vgteam / toil-vg

Distributed and cloud computing framework for vg
Apache License 2.0
21 stars 14 forks source link

Preparing and annotating GAMs for mapeval takes too long #449

Open adamnovak opened 6 years ago

adamnovak commented 6 years ago

I am running mapeval repeatedly on the same input GAM (from toil-vg sim) with different target graphs and mapper settings.

Each toil-vg mapeval run needs to separately decompress the GAM to FASTQ, and also annotate it, and also extract it to JSON for some reason. (I manage to skip chunking by running in single chunk mode.)

Since vg can take interleaved GAM input, I should be able to just pass a pre-annotated GAM directly in to the mapping step.

Failing that, I need a way to save the files that mapeval is generating before the actual mapping steps, and pass them back in later to another mapeval run.

glennhickey commented 6 years ago

Yeah, it's a huge time sink. I'll fix this. I think the most general way is to add a fastq input option as that'll work equally well for vg/bwa comparisons.

On Thu, Feb 22, 2018 at 2:37 PM, Adam Novak notifications@github.com wrote:

I am running mapeval repeatedly on the same input GAM (from toil-vg sim) with different target graphs and mapper settings.

Each toil-vg mapeval run needs to separately decompress the GAM to FASTQ, and also annotate it, and also extract it to JSON for some reason. (I manage to skip chunking by running in single chunk mode.)

Since vg can take interleaved GAM input, I should be able to just pass a pre-annotated GAM directly in to the mapping step.

Failing that, I need a way to save the files that mapeval is generating before the actual mapping steps, and pass them back in later to another mapeval run.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/toil-vg/issues/449, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7sZ1XV0WrOpCjJjTOEvVYCWkSYsfks5tXcITgaJpZM4SP3bc .