iMetOsaka / UNAGI

3 stars 4 forks source link

More user options #4

Open palfalvi opened 3 years ago

palfalvi commented 3 years ago

Hi!

I am trying to compile a long gene annotation pipeline and UNAGI looked like a great tool to be incorporated. However, the lack of control options really slowing down the pipeline compared to other elements. At the moment I cannot modify and test the code, so I just leave some suggestions here. If I will have some time, I will try to look into it in detail, too, but if some authors can address at least some of these points, it would be highly appreciated!

1, File compression: not the biggest issue, but since fastq files are usually stored as fastq.gz or .tgz, it would be a great help if the software could handle this. ATM, it just throws an error and quits.

2, Parallelization: If I understand well, UNAGI uses only one thread for all its processes. At least for minimap and samtools, it would be great if the user could tweak computer usage to speed up things.

3, sortedBam as input: For resource management and to have more control over mapping, it would be great if we can pass the initial mapping and channel a bam file directly. UNAGI can just pick up from that point.

4, gtf output: As a lot of downstream software would use gtf annotation files, it would be great if this is available after the run, and no need to look for some custom converter script.

Best regards, Gergo

JungNicolas commented 3 years ago

Hello Gergo and thank you for your suggestions. I will be looking into what I can implement in an update, however, it may take a little while. I'll let you know here when I have made he changes.

Regards, Nicolas.

JungNicolas commented 3 years ago

Hello again,

I didn't have much time to work on UNAGI so I started with the easy part: File compression. You should now be able to use fastq.gz files as your input, either at the very top or as a stranded file.

As for the next steps:

  1. Parallelization: The core parts of UNAGI were not initially built with it in mind so it will take some work to get to a fully parallel process. However, in the meantime you can tweak every options used by the tools within UNAGI by changing them in the "Command options" section in the app/conf.ini file. As long as the output is in the correct format, it should not cause any problems so setting your multi-threading there should work.

  2. sortedBam: this should not pose too many problems and I can probably implement it quickly before the parallelization part. I'll let you know as soon as it's there.

  3. gtf output: At first glance, it should not be a problem either.

Best regards, Nicolas.