This PR improves the execution time of the run.sh script from an average of 1213s to 792s for constructing the SwissProt database by making a few changes:
First of all, @bmesuere and I ported the FunctionalAnalysisPeptides.java script to JavaScript (NodeJS). By doing so, we achieved a very respectable speedup with our real world test data (from 168s to 67s on the same machine).
Next, I parallelised some of the steps in the build-process to better leverage the capabilities of multi-core CPU's. Computing the functional annotations for the original and equalized peptides, for example, can be performed in parallel. This lead to another great improvement on the SwissProt test set.
I expect these changes to make a big difference on the construction time of the complete Trembl database.
The output of the new script has been compared to the output of the old build script and is verified to be identical.
This PR improves the execution time of the
run.sh
script from an average of 1213s to 792s for constructing the SwissProt database by making a few changes:FunctionalAnalysisPeptides.java
script to JavaScript (NodeJS). By doing so, we achieved a very respectable speedup with our real world test data (from 168s to 67s on the same machine).I expect these changes to make a big difference on the construction time of the complete Trembl database.
The output of the new script has been compared to the output of the old build script and is verified to be identical.