virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
135 stars 43 forks source link

Parameter advising to speed-up AmpliconArchitect #51

Open iprada opened 5 years ago

iprada commented 5 years ago

Dear Viraj,

It is me again. I have manage to skip the ZeroDivisionError by filtering out the reads with low mapping quality. I am writting you to ask if you could provide some lines in the description of the software to tune the parameters to speed-up AmpliconArchitect. I am running it on my simulation set with 13000 circles simulated at 30X (that is around 10000000x2 Illumina raw reads) and the software has been running for 4h at the moment I am writting this.

In my use case, I am only iterested on detecting the circular DNA breakpoints. I am not interested in reconstructing the fine structure of the amplicons. Hence, I expect that playing around with the parameters --runmode and --extendmode might make the software run faster. If that is the case, could you provide a minimal explanation about how to set this parameters?

PS: I have seen in you paper and in the source code that you have some very heavy numerical calculations. You are probably aware of this, but I think that AmpliconArchitect (As I have said before, I think that it is great that it can reconstruct the amplicons) could benefit a lot from just-in-time compilation of the main mathematical function using Numba. I have personaly obtained speed-ups of around 2 orders of magnitude for the numerical heavy functions of Circle-Map and the implementation effort has been almost minimal.

Best wishes,

Iñigo

virajbdeshpande commented 5 years ago

Hi Iñigo,

Thanks for trying out AA and the suggestion of using Numba. The runtime speedup is currently a non-trivial problem from a user perspective and would only be accessible to advanced users like you. It will be better to implement the runtime speedups internally rather than adding complicated user documentation for that. I will work on that in the near future.

For the parameter --extendmode:

For the parameter --runmode: Since you only want the breakpoint edges, you can use the value BPGRAPH. This should not take too much longer than purely generating the breakpoint edges and currently AA does not have the option to stop right after breakpoint edge generation. However, if you think that is not the case, you can find intermediate files with filename format as: {out}_amplicon{ampliconid}_edges_cnseg.txt in the output directory.

If it is still taking a while, try downsampling down to 10X coverage. AA should still be able to perform well after downsampling.

Let me know if this helps. Happy to hear more comments from you! Best, Viraj