jessieren / DeepVirFinder

Identifying viruses from metagenomic data by deep learning
Other
116 stars 32 forks source link

Refactoring of main loop #26

Open cerebis opened 3 years ago

cerebis commented 3 years ago

Based off @papanikos earlier fork, this pull request goes further to refactor the main loop.

Filtering of sequence length is now possible for short and long sequences, as I found that very long sequences seemed to cause DVF to halt indefiinitely. Rather than the ad-hoc fasta parsing, the main loop now uses biopython (though this could also be replaced). In doing so, the logic which involves batch processing is much simpler. Further, the batch size is now tuneable by the end-user.

Output file handling logic has been refactored for simplicity but otherwise the same functionality.

Reverse complementing sequences is now done in a manner which is very fast and analogous to many existing tools and APIs.

[ed] minor typo corrections

papanikos commented 3 years ago

Hi @jessieren and @chaodengusc

I am mentioning you explicitly, hoping you get a personal notification.

Any chance you might review these changes?