johnomics / tapestry

Validate and edit small eukaryotic genome assemblies
MIT License
31 stars 2 forks source link

use already mapping with minimap #10

Open sravel opened 9 months ago

sravel commented 9 months ago

Hello, I would like to know if it is possible to modify Tapestry to take a minimap mapping BAM as input, so as not to redo the mapping, hich would also allow for not subsampling the fastq.

Thanks

johnomics commented 9 months ago

Hello @sravel - I can't test this myself just now but I think this should be possible by copying (or linking to) your BAM file as reads_assembly.bam in the weave output folder, with a reads_assembly.bam.bai index. You'll probably need to fake a reads.fastq.gz file as well but I think it could be an empty file.

weave checks for existing sampled reads and BAM files and uses those if they are there, rather than recreating them (see sample FASTQ check and BAM checks. The only requirement is that the BAM file must be newer than the assembly FASTA and the read FASTQ (because if either of those have changed, the BAM will need recreating). So as long as you set up your BAM file after your FASTA and FASTQ it looks like it will work. I don't think the sampled reads are used for anything else after the BAM is created.

I'm not confident about this though - Tapestry runs minimap2 with several options and I'm not sure if BAMs produced with different options will work (see minimap2 command). I don't think I'd want to implement accepting BAMs as inputs because of this, validation might be tricky.

If you have time to try this, please let me know if this works or not though, I can look again.

Thanks John