linsalrob / PhiSpy

Prediction of prophages from bacterial genomes
MIT License
70 stars 20 forks source link

Automatic preparation of training sets and some minor updates #23

Closed pdec closed 4 years ago

pdec commented 4 years ago

Added make_training_sets.py which allows to automatically prepare training sets or update the existing ones. However, while --retrain, the kmers set is updated but previously prepared training sets are not recalculated unless present in --indir on considered in --groups file. Also, added 12 new reference genomes and created some training groups for bacterial families or genera:

Added a flag to control the number of Random Forest trees during the classification. Added a flag to choose the type of kmers and hence updated makeTrain.py and makeTest.py This also indicated what kmer files should be within data/.