ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
176 stars 63 forks source link

Allow for multiple seeds #159

Open FSciammarella opened 3 years ago

FSciammarella commented 3 years ago

Hello,

I have been using NOVOPlasty to try to gather stray mitocondrial reads from RNA-Seq data. As expected from the dataset, it is mainly comprised of fragmented cDNA reads. One of the steps I use is running NOVOPlasty with many seeds, and sometimes many KMer configuratios. Th problem is that rebuilding the KMer Hashtable between runs is awfully slow, and sometimes the seed is non-usable. My proposed solutions are: 1-) Accept a list of seed files, or a seedfile with multiple sequences, and run the pipeline through each one with the same HashTable 2-) Make the HashTable a persistent file, recoverable between runs with the same KMer Size.

I think that would greatly expand both usability and usefulness of this excellent tool.

Cheers.

ndierckx commented 3 years ago

Hi,

I actually have another assembler where I have both options available, I could implement them in NOVOPlasty if you want. Option 1 is maybe a bit more difficult, I will have to see, but option 2 will be relatively easy, I think I can copy most of the code

Will add them to the next update, maybe already next week..

FSciammarella commented 3 years ago

Hello @ndierckx, sorry for radio silence, life happens... Yes, I would be really, really, really grateful if you could get them to work as you said. If I knew perl even a little better I would be up to helping, but not my expertise.

I imagine you also had your own share these last few months as the update never actually hit "next week". Hope you are okay.

ndierckx commented 3 years ago

Hi,

NOVOPlasty was my PhD project so I it is already for a while not my job anymore, so updates will be scarce.. I added option 2 today, will update the version, you can let me know if it works.

First you put "yes" after Store Hash in the config file, to sore the hash locally To use the local hash, put the projectname of the stored hash after the Store Hash option in the config

ndierckx commented 3 years ago

It is online, but changed how to run the local hashes: Do it like this, just give the path to each file

Forward reads = /path/to/HASH2B_project.txt Reverse reads = /path/to/HASH2C_project..txt Store Hash = /path/to/HASH_project..txt